Wednesday, January 14, 2015

Oil prices and its relationship with the amount of people drive in USA

Oil Curve

I always try to replicate new york times visualization. Eventhough replicating them exactly is not possible but i reach pretty close to the actual image using R.

This plot was generated using the oil prices and the distance driven at New York Times.

The image shown above was generated using R and further updates were made using the Inkspace. Once we generate the plot we import it using a pdf format and enhance it by adding labels in Inkspace.The data is available here . The readers should save the file in a folder as oils.csv.To replicate simply copy paste the entire R code mentioned below and you should get a very similar looking plot. Before running the R code make sure you reference the folder that oils.csv is saved in R. R will always search for its file in its current directory.

setwd("C:/Users/agohil/Book")
embargo=read.csv("oils.csv")
par(mar = c(5,7,5,7))
plot(embargo$percapita,embargo$price, type ="n",axes = FALSE,xlab = NA,ylab =NA)
abline(v= c(7000,8000,9000), h = c(25,50,75,100),lty = 3,lwd = 1.5)
lines(embargo$percapita,embargo$price, lwd = 2, col = "#28363F")
points(embargo$percapita,embargo$price, lwd = 2, pch = 21, bg = "white", col ="#28363F", cex = 1.5)
axis(1,at = c(7000,8000,9000),tick = FALSE, cex.axis=0.5)
axis(2,at = c(25,50,75,100),tick = FALSE, las= 2,cex.axis=0.5)
axis(3,at = c(7000,8000,9000),tick = FALSE,cex.axis=0.5)
axis(4,at = c(25,50,75,100),tick = FALSE, las= 2,cex.axis=0.5)

To import the data :

embargo=read.csv("oils.csv") # to learn more about the function type ?read.csv in R console window

Now if readers wish to observe the data partially they can do this by using the head() function to view the first 6 entries or tail() to view the last 6 entries of the data.

head(embargo)
head(embargo, 2)
tail(embargo)

Now, we are ready to plot our image in R. Note that no image will be generated here. The argument type = “n” will supress the plot. So why do we do this? Well we would like to customize the entire plot with axis and gridlines. The first two arguments in the plot are X (distance driven by individuals in USA) and y (price of a barell of oil). We will also supress axes and labels by using the axes = NA, xlab = NA, and ylab = NA respectively.

par(mar = c(5,7,5,7))
plot(embargo$percapita,embargo$price, type ="n",axes = FALSE,xlab = NA,ylab =NA)

The par() function is uses many different arguments and we have used mar=c() to specify the margins of our plot. Readers can use type ?par to get more information on the function. The lines() will generate a partial plot.The abline() function will plot the gridlines.

abline(v= c(7000,8000,9000), h = c(25,50,75,100),lty = 3,lwd = 1.5)
lines(embargo$percapita,embargo$price, lwd = 2, col = "#28363F")

But, we like to generate the line and plot the points on it. We can do this by plotting the lines() before the points() function - using the following lines of code:

points(embargo$percapita,embargo$price, lwd = 2, pch = 21, bg = "white", col ="#28363F", cex = 1.5)

We can now generate the axis and label all the 4 sides using the following lines:

axis(1,at = c(7000,8000,9000),tick = FALSE, cex.axis=0.5)
axis(2,at = c(25,50,75,100),tick = FALSE, las= 2,cex.axis=0.5)
axis(3,at = c(7000,8000,9000),tick = FALSE,cex.axis=0.5)
axis(4,at = c(25,50,75,100),tick = FALSE, las= 2,cex.axis=0.5)

No comments:

Post a Comment