Evaluating Multicollinearity with Multiple Regression in R

Below is computer code written in the R programming language that conducts a multicollinearity evaluation of a multiple regression model using TOL (for "tolerance") and VIF (for "variance inflation factor"). The program also conducts Ridge Regression to test the stability of estimated parameters in the context of multicollinearity. Just copy and paste it into R and watch it rip. The data set for this R program can be found HERE.

# First we get our data.
library(car)
mydata <- read.table("panel80.txt")
# attach(mydata) # In case you want to work with the variables directly
names(mydata) # This shows us all the variable names.
# options(scipen=20) # suppress "scientific" notation
options(scipen=NULL) # Brings things back to normal
reagan.model <- lm(REAFEEL3 ~ INC + AGE + PARTYID + REPPART3 + INC:AGE + INC*PARTYID*REPPART3, data=mydata)
summary(reagan.model)
layout(matrix(c(1,2,3,4),2,2)) # optional 4 graphs/page
plot(reagan.model) # These are diagnostic plots.
vif(lm(REAFEEL3 ~ INC + AGE + PARTYID + REPPART3 + INC:AGE + INC*PARTYID*REPPART3, data=mydata))
tol <- 1/vif(lm(REAFEEL3 ~ INC + AGE + PARTYID + REPPART3 + INC:AGE + INC*PARTYID*REPPART3, data=mydata))
tol
mysubsetdata<-subset(mydata, select=c(REAFEEL3, REPPART3, INC, AGE, PARTYID)) #This keeps only the variables that we are using.
cor(mysubsetdata, use = "pairwise.complete.obs") # A correlation matrix for the variables in the regression

windows()

library(MASS)
x <- lm.ridge(REAFEEL3 ~ INC + AGE + PARTYID + REPPART3 + INC:AGE + INC*PARTYID*REPPART3, data=mydata, lambda=seq(0,100,by=1))
plot(x)
title("Ridge Regresssion")
abline(h=0)
abline(v=50,lty=3)
x # This prints out the values of the ridge estimates as lambda increases.

 

 

 

This entire site is Copyright © 1997-2024 by Courtney Brown. All Rights Reserved.
DISCLAIMER
URL: https://courtneybrown.com