Student Performance Data Analysis

This data is taken from Kaggle.

Inspiration

To understand the influence of the parent’s background, test preparation, etc on students performance

This is my code to understand the data.

setwd("C:/Users/Srijit Mukherjee/Desktop")
data = read.csv("StudentsPerformance.csv")
head(data)
str(data)
colnames(data)
#predictors are categorical and responses are continuous
#there are three continuous responses, let's see how they are related
response = data[,6:8]
library(corrplot)
corrplot(cor(response))
corrplot(cor(response), method="number")
#they are highly correlated, let's try to predict one of them only.
mathdata = data[1:6]
readdata = data[c(1:5,7)]
writedata = data[c(1:5,8)]

mathmodel = lm(mathdata$math.score~.,mathdata)
readmodel = lm(readdata$reading.score~.,readdata)
writemodel = lm(writedata$writing.score~.,writedata)
mathscore_lm <- predict(mathmodel, mathdata[,-6])
ssr_mathscore_lm <- t(mathdata[,6] - mathscore_lm) %*% (mathdata[,6] - mathscore_lm)
rsq_mathscore_lm <- cor(mathdata[,6], mathscore_lm)^2
#it is not giving good results

#Let's do a forward and backward selection
base=lm(mathdata$math.score~1,mathdata)
top=lm(mathdata$math.score~mathdata$gender*mathdata$race.ethnicity*mathdata$parental.level.of.education*mathdata$lunch*mathdata$test.preparation.course,data)
forward_step=step(base, scope = list(upper=top, lower= ~1), direction = "forward", trace = FALSE)
forward_step
backward_step=step(top,direction = 'backward', trace = FALSE)
backward_step

#This clearly implies there need not be any variable selection.

backward_step

mathscore_lm <- predict(backward_step, mathdata[,-6])
ssr_mathscore_lm <- t(mathdata[,6] - mathscore_lm) %*% (mathdata[,6] - mathscore_lm)
rsq_mathscore_lm <- cor(mathdata[,6], mathscore_lm)^2

par(mfrow=c(2,3))
plot(mathdata$gender,mathdata$math.score)
plot(mathdata$race.ethnicity,mathdata$math.score)
plot(mathdata$parental.level.of.education,mathdata$math.score)
plot(mathdata$lunch,mathdata$math.score)
plot(mathdata$test.preparation.course,mathdata$math.score)

I will share a good and organized visualization soon.

Leave a Reply

Your email address will not be published. Required fields are marked *

Go Top