How to perform multiple linear regression in R? Explain with an example
data<- read.csv(file.choose()) # choose the 50_startup.csv data set
View(data)
summary(data)
Now we will find the correlation of the data
# 7. Find the correlation b/n Output and input
pairs(data)
data=data[,-4]
# 8. Correlation Coefficient matrix - Strength & Direction of Correlation
cor(data)
### Partial Correlation matrix - Pure Correlation b/n the varibles
#install.packages("corpcor")
library(corpcor)
cor2pcor(cor(data))
Now we will fit the model and rebuild model by removing each columns to see the change in accuracy.
# The Linear Model of interest with all the columns
model.data <- lm(Profit~.,data=Cars)
# Model based on RD and Adm
model.data1<-lm(Profit~R.D.Spend+Administration,data=Cars)
summary(model.data1)
# Model based on RD and Marketing
model.data2<-lm(Profit~R.D.Spend+Marketing.Spend,data=Cars)
summary(model.data2)
#Marketing and Administration are highly insignificant so we are removing
finalmodel<-lm(Profit~R.D.Spend,data=data)
summary(finalmodel)
Now we will evaluate the model to prove the assumptions that the errors should be normally distributed.
# Evaluate model LINE assumptions
plot(finalmodel)
hist(residuals(finalmodel)) # close to normal distribution