Tuesday, November 15, 2016

Static, Multiple, Interactive - Part II

Static, multiple, to interactive - Part II

Introduction:

This post is a in continuation of my earlier post titled “Static, multiple, to interactive - Part I”. Hence, in this post we will dive into adding some nice legend to the plot, fixing the colors and generating multiple plots using a loop.
The following plot is similar to something we generated in part I but we have made it a little more informative.
Readers may find this post long and confusing. I would suggest readers to use the code and play with it by chaning values, to get comfortable.

Load the data and required libraries:

The data used for generating the spatial plot can be downloaded from Data.gov webite. Alternatively you can download the file here
library(maps)
library(dplyr)
lct= read.csv("locations.csv", stringsAsFactors =FALSE,na.strings="#N/A")

Removing some states from the map

Since our data consist of all the farmer markets located in USA. It also includes some states such as Puerto rico, Alaska, Virgin Island and Hawaii. Now, not that i dont like them, but i would like to plot farmers market only located in the main USA map. So we will clean the data a bit to remove the points corresponding to these 4 states.
lct= filter(lct, !is.na(x))
unique(lct$State)
##  [1] "Vermont"              "Ohio"                 "Michigan"            
##  [4] "South Carolina"       "Missouri"             "New York"            
##  [7] "Tennessee"            "Delaware"             "District of Columbia"
## [10] "Minnesota"            "Virginia"             "Pennsylvania"        
## [13] "Nebraska"             "Illinois"             "Florida"             
## [16] "Washington"           "Kansas"               "New Jersey"          
## [19] "Utah"                 "Maryland"             "Indiana"             
## [22] "Nevada"               "Colorado"             "Arizona"             
## [25] "Alabama"              "Iowa"                 "Wisconsin"           
## [28] "South Dakota"         "Massachusetts"        "Louisiana"           
## [31] "New Mexico"           "Maine"                "Georgia"             
## [34] "Oklahoma"             "Kentucky"             "Hawaii"              
## [37] "California"           "North Carolina"       "Oregon"              
## [40] "West Virginia"        "Texas"                "Idaho"               
## [43] "Montana"              "North Dakota"         "Alaska"              
## [46] "Virgin Islands"       "Rhode Island"         "Arkansas"            
## [49] "Connecticut"          "Mississippi"          "New Hampshire"       
## [52] "Wyoming"              "Puerto Rico"
lct= filter(lct, State!= "Alaska"& State!="Puerto Rico"& State!="Hawaii"& State!="Virgin Islands")
We will first remove all the “NA” in the dataset using the filter argument. The use of filter argument is described in detail in the following paragraph.
The unique() function is a base r function that picks unique elements from a vector or data frame or an array. We run this code as we would like to know the exact names of the states that we like to remove. As the next step we will use the filter() function to keep only the states that we like to keep in the data. Now make a note of how to write the code for a negation. under the filter argument we are simply saying filter out the state Alaska, Puerto Rico, Hawaii and Virgin Island using the “!=”.The first argument in the filter() function is the data set, the second argument are the states we like to remove.

Map:

Following is the map that gets generated.

We can now look at the codeto see how to do this. For now let us forget why did we calculate the absolute and area. Lets look at just the map code. We have discussed this code in part I.But, here we have made a small change.
We did like to show all the farmers markets that sell cheese. The Chese column in the data set corresponds to column 35 and its a boolean i.e. it takes two value “Y” meaning Yes and “N” meaning no. If you liek to quickly see all the column numbers you can write colnames(lct) in the R console window and look at the column number.
absolute= abs(max(lct$x)-min(lct$x))*mean(ifelse((lct$Cheese)=='Y',1,0))
area=min(lct$x)+absolute

par(mar=c(5,4,4,2), bg= "#E8E6E7")
plot(lct$x,lct$y, type="p", pch =19, cex= 0.1, 
     col=ifelse(lct[,35]=='Y','#482345','#C384A9'),bty="n", xlab="",axes=FALSE, ylab="", ylim=c(22,49))
rect(min(lct$x),22,max(lct$y),16, col="#C384A9")
rect(min(lct$x),22,area,16, col="#482345")
text(area,23,paste(round(mean(ifelse((lct$Cheese)=='Y',1,0)),2)*100,"%",sep=""), cex=0.7)
mtext(" Cheese Farmers Market in USA", side=3)
mtext("source: data.gov", side=1, cex=0.6)
In the plot() function we will edit only the col argument. WHY? simply because we like to color just the markets serving cheese with a different color. The col argument will color points serving cheese with color “black”. To make this happen we will have to tell R - if you encounter a Y in the column 35 color it #482345 else keep the color #C384A9. R makes this very easy by allowing us to use the ifelse() function. The ifelse(lct[,35]==‘Y’,‘#482345’,‘#C384A9’) can easily be read as if condition is <lct[,35]==‘Y’> TRUE then something else something.
Now we will make a fancy legend. THe legend is not generated using teh normal base R legend function but using two overlapping rectangels. In R we can generate a rectangle using the rect() function provided in base R.
absolute= abs(max(lct$x)-min(lct$x))*mean(ifelse((lct$Cheese)=='Y',1,0))
area=min(lct$x)+absolute

rect(min(lct$x),22,max(lct$y),16, col="#C384A9")
rect(min(lct$x),22,area,16, col="#482345")
text(area,23,paste(round(mean(ifelse((lct$Cheese)=='Y',1,0)),2)*100,"%",sep=""), cex=0.7)
The rect() function takes 4 main arguments- xleft, ybottom, xright, ytop. We require two rectangles one shorter then the other to get the required effect. So first lets discuss the longer rectangle. The arguments -xleft, ybottom, xright, ytop are min(lct\(x),22,max(lct\)y),16 respectively. We overlap this rectangle with anogther rectangle but this will be shorter.
It may get confusing now. Since we need to fill the color for the cheese markets we need to know what perscentage of the total farmers market that sells cheese. This calculation is stored in the variable area. We use this variable to define the xright argument in the second rect() function. Now since the second rect() is smaller if we generate it prior to the longer rectangle it will be hidden. So the order of plotting these two rectangles is important.
We finally use the text() function to plot the the actual perscentage of teh cheese markets which is close to 33%. We have nested paste and round functions within the text() function. To learn more about paste and round functions simply type ?paste and ?round in the R console window.

Multiple plot:

par(mfrow=c(2,2),bg="#E8E6E7",oma = c(0, 0, 3, 0), mar = c(5, 0, 0, 2))
j = colnames(lct)
for (i in 29:32){
    plot(lct$x,lct$y, type="p", pch =19, cex= 0.5, 
         col=ifelse(lct[,i]=='Y','#482345','#C384A9'),bty="n", xlab="",axes=FALSE, ylab="")
  
  mtext(paste(colnames(lct[i]),"Farmers Market in USA", sep=" "),side=3,cex=0.6)
  mtext("source: data.gov", side=1, cex=0.6)
  legend("bottomleft", col='black',fill=TRUE,box.col="#E8E6E7",legend=paste(colnames(lct[i])), cex=0.5)
}

To generate multiple plots we have simply writte a for loop. Multiple plots are very handy if you like to compare more than one element in a visualization. We can plot all the columns that indicates various food products sold. But to keep the learning part easy we will only plot 4 maps.
We have only made a small change to the plot() function. we have moved it within the loop in R and replaced lct[,35]==‘Y’ with lct[,i]==’Y’. We did this to inform R that everytime R loops through the plot argument replace i with 29,30,31 and 32 column data. Ignore the oma and mar arguments for now, they are placed there to remove the extra space between plots.The mfrow argument will inform R that it needs to plot 4 plots.
Finally, we have removed our original custom legend and replaced it with the base R legend() function. To learn more about the various arguments type ?legend in the R console window.Make a note of use legend=paste(colnames(lct[i])) within the loop.

Code:

library(maps)
library(dplyr)

lct= read.csv("locations.csv", stringsAsFactors = FALSE,na.strings="#N/A")
lct= filter(lct, !is.na(x))
unique(lct$State)
lct= filter(lct, State!= "Alaska"& State!="Puerto Rico"& State!="Hawaii"& State!="Virgin Islands")

absolute= abs(max(lct$x)-min(lct$x))*mean(ifelse((lct$Cheese)=='Y',1,0))
area=min(lct$x)+absolute

par(mar=c(5,4,4,2), bg= "#E8E6E7")
plot(lct$x,lct$y, type="p", pch =19, cex= 0.1, 
     col=ifelse(lct[,35]=='Y','#482345','#C384A9'),bty="n", xlab="",axes=FALSE, ylab="", ylim=c(22,49))
rect(min(lct$x),22,max(lct$y),16, col="#C384A9")
rect(min(lct$x),22,area,16, col="#482345")
text(area,23,paste(round(mean(ifelse((lct$Cheese)=='Y',1,0)),2)*100,"%",sep=""), cex=0.7)
mtext(" Cheese Farmers Market in USA", side=3)
mtext("source: data.gov", side=1, cex=0.6)

#To generate multiple plot
par(mfrow=c(2,2),bg="#E8E6E7",oma = c(0, 0, 3, 0), mar = c(5, 0, 0, 2))
j = colnames(lct)
for (i in 29:32){
    plot(lct$x,lct$y, type="p", pch =19, cex= 0.5, 
         col=ifelse(lct[,i]=='Y','#482345','#C384A9'),bty="n", xlab="",axes=FALSE, ylab="")
  
  mtext(paste(colnames(lct[i]),"Farmers Market in USA", sep=" "),side=3,cex=0.6)
  mtext("source: data.gov", side=1, cex=0.6)
  legend("bottomleft", col='black',fill=TRUE,box.col="#E8E6E7",legend=paste(colnames(lct[i])), cex=0.5)
}

No comments:

Post a Comment