Saturday, March 16, 2013

Scatterplot in R

Scatter plots in R <!-- Styles for R syntax highlighter

Scatter plots in R

Abstract: The main purpose of this page is to learn to plot in R. This document also explains plotting scatter plot in R and how it can be used to visualize and interpret the data.
Introduction: scatter plots are widely used in economics and finance to get a basic idea of the underlying datset.
Data:
The dataset used is house price dataset available alongwith an undergarduate textbook “Introductiory Econometrics” by Jeffrey M. Wooldridge. The data consists of 506 observations.The dataset consist of the following variables:
  1. price- median housing price, $
  2. crime- crimes committed per capita
  3. nox - nitrous oxide, parts per 100 mill.
  4. rooms - avg number of rooms per house
  5. dist - weighted dist. to 5 employ centers
  6. radial - accessibiliy index to radial hghwys
  7. proptax - property tax per $1000
  8. stratio - average student-teacher ratio
  9. lowstat - % of people 'lower status'
    1. lprice- log(price)
    2. lnox - log(nox)
    3. lproptax - log(proptax)
    for the purpose of this analysis we would only utilize price and crime.
    R code: Since the data was available in the Raw text format i copy pasted the data in Excel and saved it as a CSV file under my R directory. Then i use the read.csv command to read in the csv file and save the dataset in hprice. I have also made use of the colnames command to get additional information of the colnames and the summary command to breifly look at the center and distribution of X and Y variables.
hprice <- read.csv("hprice.csv", header = TRUE, sep = ",")
colnames(hprice)
##  [1] "price"    "crime"    "nox"      "rooms"    "dist"     "radial"  
##  [7] "proptax"  "stratio"  "lowstat"  "lprice"   "lnox"     "lproptax"
summary(hprice$price)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    5000   16800   21200   22500   25000   50000
summary(hprice$crime)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.01    0.08    0.26    3.61    3.68   89.00
The following line of commands will plot a scatterplot in R.The scatter plot generated will have circles which are not filled up. However, to fill up the circles the user needs to use the “pch” command in R.
plot(hprice$price, hprice$crime, col = "BLUE")
plot of chunk unnamed-chunk-2
plot - basic plot command in R hprice$price - X variable in Plot hprice$crime - Y variable in plot command col - color
plot(hprice$price, hprice$crime, pch = 19, col = "BLUE")
plot of chunk unnamed-chunk-3
You would observe that the points are too close to each other and so an additional command “cex” can be added. cex command will help in reducing the size of the circles.
plot(hprice$price, hprice$crime, pch = 19, cex = 0.5, col = "BLUE")
plot of chunk unnamed-chunk-4
The image above provides much more information about the relationship between x and Y variables. We can observe that as the home prices rise the crime rate significantly drop. It is not hard to reason with this, as people would not like to live in areas where the crime rates are high and hence the supply of houses are greater than the demand resulting in lower prices. On the other hand demand for houses will be much higher when the crime rates are low as people are willing to spend extra for their security and safety.
-->

No comments:

Post a Comment