Writing a simple Function in R
Introduction: This is post aims at introducing new users of R on how to write a function in R and execute the same to get reasonable output. We would write a very small function that calculates a t statistics to test equality of means. Hence, I would first introduce the concept under section 1 and implement the R code in section 2 and finally, execute the .R function file in section 3.
Section 1: Equality of means: If users have taken basic statistics, it would be hard for me to believe that the instructor diod not teach hypothesis testing for equality of means. But in order to understand how the function in R can be coded it is very essential for the user to know the concept.
We are interested in testing a simple hypothesis of equality of means i.e. mean of population X equals mean of population Y. Often economists are interested in knowing the impact of certain elements on a population against the section of the population that was exposed to certain different element.Some of the examples bellow discuss this
1)Average test score of GMAT of students who graduated with a math degree against ones who graduated with a degree in literature. 2)Average yield of a crop due to use of a certain fertilizers against the yield of the crop of farms that used a different fertilizer.
For each population we would like to test if the mean of population 1(MU1) equals mean of the populaton2(MU2). Since it is hard to estimate the true population mean and standard deviation we would use the sample mean and standard deviation to estimate the same.
Step 1: Null hypothesis : H0 : Mu1 = Mu2 Alternate Hypothesis : H1 : Mu1 ??? Mu2
Step2 : calculate the T stat using the following set of formulae:
Step 3: Compare the calculated T with the t from the table using the above mentioned equation and reject the null hypothesis if the following equality holds true.
Section 2: In order to create a function , users need to open a new R script available under the File drop down, located under the menubar. The teqmu1 function is written in R. Teqmu1 is the name of the function. This function takes vector x and vector y as inputs and returns the T statistic.
The first line of the function should define the name of the function followed by the input variables. The R command length is used in the function to calculate the length of the x and y vector which will then be used in the formula for the T statistic. We have also made use of R commands SD , which calculates the standard deviation and Mean which calculates the mean of the vectors X and Y.
teqmu1 <- function(x, y) {
m = length(x)
n = length(y)
sp = sqrt(((m - 1) * sd(x)^2 + (n - 1) * sd(y)^2)/(m + n - 2))
t = (mean(x) - mean(y))/(sp * sqrt(1/m + 1/n))
return(t)
}
Section 3:
In order to execute the command we have to save the file as an teqmu1.r file. Note that the name of the file should match the name of the function( in this case the filename would be teqmu.r)
in order to run this function users have to souce it first. In order to source the function click on the code drop down available on R menu bar -> source the file and select the function file.
Now to execute the function create two random vectors in R:
data1 = c(1, 4, 3, 6, 5) # vector for X
data2 = c(5, 4, 7, 6, 10) # vector for Y
teqmu1(data1, data2) # executing the function
## [1] -1.938
The last step is to call the function, once you get the T statistics you can look up the t value from the table using the equation under Step 3. The researcher now can decide to reject the null or not based on the step 3 formula.