Thursday, March 29, 2018

Trashy Charts and I - p2


I visit various websites to collect data. Most of the sites i visit are mostly managed by various Government of India central or state departments / ministries. Given everything is digitized one realizes the extent to which this digitization has brought to light some of the issues related to quality of the websites as well as the quality of reports published.
One good thing coming out of digitization and open data is the ease of access of data collected by these ministries. When one compares the quality of data and its management with other developed countries one realizes we have a long way to go. Given it is easy to fix websites (using google search engine) and produce quality reports (using open source technologies) in this day and age i hope this effort is quicker.  Since i know a bit of R and a bit of data visualization i thought i give my 10 cents.
One common trend i find is the extensive use of pie charts.  Every report i read has a combination of pie charts, line charts and bar plots however my favorite is pie charts since they are so easy to criticize and even easier to fix.
The following pie chart is extracted from a report - Road Accidents in India 2015 published by the National Crime Bureau of India.
What did i not like about this pie chart ?
  1. background color - too dark.  Why do we need background color for this chart. a simple white or gray background does amazing job.
  2. header has a different background color . Why?
  3.  Pie has 13 sectors - It is hard to read a pie chart with so many slices. The same message is better conveyed with an ordered bar chart.
  4. The colors used to fill the slices are too similar and hence creates even more confusion. Since there are 13 slices and colors are similar its hard to know which data point corresponds to which state. For e.g. data point 8.8 and 4.2 have very similar colors.
         Screen Shot 2018-03-13 at 9.38.14 PM


Here is my transformation of the pie chart:

It just looks so much better without all that unnecessary color and large fonts.
The code for the same :

#############################
#Packages
#############################
library(ggplot2)
#############################
#data
#############################
acdt_p <- c(13.8, 12.7, 11, 8.8,7.8,6.5,4.8,4.8,4.6,4.2,2.9,2.6,2.2, 13.3)

labels <- c("Tamil Nadu", "Maharashtra", "Madhya Pradesh", "Karnataka", "Kerala", "Uttar Pradesh",
            "Andhra Pradesh", "Rajasthan", "Gujarat", "Telangana", "Chhattisgarh", "West Bengal",
            "Haryana", "Other States")

data.f <- data.frame(states= labels,value= acdt_p)
#############################
#Plot
#############################
ggplot(data.f, aes(x= reorder(states, value), y = value, fill = "value")) +
       geom_bar(stat = "identity", position = "identity") +
       geom_text(aes(label = value), hjust= 1.5)+
       scale_fill_manual(values=c("#3182bd"), guide = FALSE) +
       coord_flip()+
       labs (title = "Percentage share in Total Number of Road Accidents (2015)",
             y="percentage of share in road accidents",
             x="state",
             subtitle= "Accidental Deaths & Suicides in India",
             caption="Data Source: http://ncrb.gov.in")+
       theme_bw()+
       theme(axis.text.x= element_text(size = rel(0.9)),
             panel.grid.major = element_blank(),
             panel.grid.minor = element_blank(),
             panel.border = element_blank(),
             axis.line= element_line(colour="black"))

Government officials can simply save a template with markdown files and just replace the data as it becomes available. Not too much to ask .... ;)

Thursday, March 15, 2018

Murder cases in India - 2016

Following chart shows trend in murder cases in India by state. The advantage of using a geo facet chart is that it give a geographic location of the state. The black horizontal line in the plot corresponds to the average number of murder cases in India in 2016. This gives us a quick overview of the states where the murder cases are higher than national average and the states below the national average.
Murder Cases in India since 2010
It should be notes that the latest data available is 2016 so there is a lag of about 2 years. I have extracted the data for this plot from NCRB.  But if you like to reproduce this plot without going through the manual labor you can download the data here.
The R code used to generate the plot is here. Part of this code is inspired by Len blog.

For more information please feel free to visit my website .