|
|
Multiple Regression Using Dummy Variables in R Below is computer code written in the R programming language that conducts multiple regression analysis using dummy variables. Just copy and paste it into R and watch it rip. The data set for this R program can be found HERE. # First we get our data. # Now let's create our intercept dummy variables. # A different way to create the same type of dummy variables is this... # Now we create the slope dummy variables and set up our linear regression model. ~~~~~~~~~~~~~~~~~~~~ When you want to test the equality of two dummy variables, you need to do one of the following tests. METHOD #1: The Wald Procedure At this point, you will need a paper and pencil to finish the equality test since the lm procedure in R does not do this for you automatically. From the coefficient-covariance matrix, V, take the variance of each variable, and the covariance for the two variables together, plus the original parameter estimates from the regression, and then calculate your F-statistic using this formula: F = [(parameter 1 - parameter 2)/(the square root of (variance of the first parameter + variance of the second parameter - 2*(the covariance of both parameters)))] and all this squared. or, using R F <- ((P1 - P2)/(sqrt(varP1 + varP2 - 2*covP1P2)))^2 where P1 and P2 come from your regression, and varP1, varP2 and covP1P2 come from your coefficient-covariance matrix, V. You can also get varP1 and varP2 by squaring the standard errors for P1 and P2 that you get from your regression. Note that the big difference between the F-statistic and the t-statistic is that with the F you are squaring everything, including the difference between the two parameter values. Now that you have the F statistic, you need the degrees of freedom, and there are two types, call them df1 and df2. Here, df1= 1, or the number of tests involved (there just one), and df2= (N - k), where N is the number of observations in your regression, and k is the number of slope and intercept parameters that you are estimating in your model. Now, using R, do the following, substituting the right numbers for F, df1, and df2: F1.stats <- df(F, df1, df2) F2.stats <- pf(F, df1, df2, lower.tail=FALSE) F1.stats will give you the density for the F statistic, which is the height of the distribution for that F value. This is useful for visualizing the shape of the distribution. F2.stats will give you the p-value for the F statistic. Without the lower.tail=FALSE statement for F2.stats, you will get the cumulative probability for the value of the F statistic. With the lower.tail=FALSE, you will get the probability of a value being greater than that F statistic. Alternatively, you can use your F-distribution table that is in the back of most statistics books. METHOD #2: The Confidence Interval Perhaps the easiest way to proceed, is simply to calculate a 95% confidence interval (using 1.96SE) for the difference between the two parameters. Thus, you want a confidence interval for (P1-P2). This is easy to do with the combined standard error that you used above, sqrt(varP1 + varP2 - 2*covP1P2), where you get these things from the coefficient-covariance matrix, V, as before. Look to see if zero is inside the confidence interval, and you are done! (For a reference, see Eric A. Hanushek and John E. Jackson, Statistical Methods for Social Scientists, New York: Academic Press, 1977, p. 124.) A NOTE ABOUT STANDARDIZED PARAMETER ESTIMATES WHEN USING DUMMY VARIABLES: If you want to compute the standardized parameter estimates for your model, do not standardize the intercept dummy variables. Leave those as values of 0 and 1. But when you standardize your variables, you will need to create a new data set that contains the all the variables in your model, including the variables that you created yourself, such as the slope dummy variables. Try using the cbind command to combine your primary data set with the other variables (but not the intercept dummy variables). Then standardize the new data set. Approach #2 for creating the dummy variables: # First we get our data.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|