Analysis on data regarding Boston and California schools using Linear models in R.
Exploratory data analysis for this data was to compare values in the data sets. The data sets include school funding, resources, scores, teacher salaries and more.
Library, importing CASchools data summary of data.
library(MASS)
library(AER)
options(scipen = 100)
data("CASchools")
?CASchools
CASchools$STR <- CASchools$students/CASchools$teachers
CASchools$score <- (CASchools$read + CASchools$math)/2
summary(CASchools)
Summary CASchools Output
> summary(CASchools)
district school county grades students teachers calworks lunch
Length:420 Length:420 Sonoma : 29 KK-06: 61 Min. : 81.0 Min. : 4.85 Min. : 0.000 Min. : 0.00
Class :character Class :character Kern : 27 KK-08:359 1st Qu.: 379.0 1st Qu.: 19.66 1st Qu.: 4.395 1st Qu.: 23.28
Mode :character Mode :character Los Angeles: 27 Median : 950.5 Median : 48.56 Median :10.520 Median : 41.75
Tulare : 24 Mean : 2628.8 Mean : 129.07 Mean :13.246 Mean : 44.71
San Diego : 21 3rd Qu.: 3008.0 3rd Qu.: 146.35 3rd Qu.:18.981 3rd Qu.: 66.86
Santa Clara: 20 Max. :27176.0 Max. :1429.00 Max. :78.994 Max. :100.00
(Other) :272
computer expenditure income english read math STR score
Min. : 0.0 Min. :3926 Min. : 5.335 Min. : 0.000 Min. :604.5 Min. :605.4 Min. :14.00 Min. :605.5
1st Qu.: 46.0 1st Qu.:4906 1st Qu.:10.639 1st Qu.: 1.941 1st Qu.:640.4 1st Qu.:639.4 1st Qu.:18.58 1st Qu.:640.0
Median : 117.5 Median :5215 Median :13.728 Median : 8.778 Median :655.8 Median :652.5 Median :19.72 Median :654.5
Mean : 303.4 Mean :5312 Mean :15.317 Mean :15.768 Mean :655.0 Mean :653.3 Mean :19.64 Mean :654.2
3rd Qu.: 375.2 3rd Qu.:5601 3rd Qu.:17.629 3rd Qu.:22.970 3rd Qu.:668.7 3rd Qu.:665.9 3rd Qu.:20.87 3rd Qu.:666.7
Max. :3324.0 Max. :7712 Max. :55.328 Max. :85.540 Max. :704.0 Max. :709.5 Max. :25.80 Max. :706.8
Linear model for Score in CASchools data set
mod <- lm(score ~ calworks+lunch+computer+expenditure+income+english+STR, data = CASchools)
summary(mod)
Score LM for CASchools
> summary(mod)
Call:
lm(formula = score ~ calworks + lunch + computer + expenditure +
income + english + STR, data = CASchools)
Residuals:
Min 1Q Median 3Q Max
-30.554 -5.382 0.192 5.037 27.966
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 661.8848393 9.0688496 72.984 < 0.0000000000000002 ***
calworks -0.0917428 0.0577709 -1.588 0.1130
lunch -0.3709982 0.0362375 -10.238 < 0.0000000000000002 ***
computer 0.0004101 0.0010371 0.395 0.6927
expenditure 0.0017438 0.0008854 1.970 0.0496 *
income 0.6190124 0.0890025 6.955 0.0000000000139 ***
english -0.2113803 0.0344657 -6.133 0.0000000020225 ***
STR -0.2782033 0.2881301 -0.966 0.3348
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 8.42 on 412 degrees of freedom
Multiple R-squared: 0.808, Adjusted R-squared: 0.8047
F-statistic: 247.6 on 7 and 412 DF, p-value: < 0.00000000000000022
Linear model for Engilsh in CASchools data set
modE <- lm(english ~ read+math, data = CASchools)
summary(modE)
English LM for CASchools
> summary(modE)
Call:
lm(formula = english ~ read + math, data = CASchools)
Residuals:
Min 1Q Median 3Q Max
-31.800 -8.781 -0.876 8.493 40.651
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 386.63508 21.90028 17.654 < 0.0000000000000002 ***
read -1.01487 0.08112 -12.510 < 0.0000000000000002 ***
math 0.44975 0.08698 5.171 0.000000363 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 12.86 on 417 degrees of freedom
Multiple R-squared: 0.508, Adjusted R-squared: 0.5057
F-statistic: 215.3 on 2 and 417 DF, p-value: < 0.00000000000000022
Linear model for Computers in CASchools data set
modc <- lm(computer ~ read+math, data = CASchools)
summary(modc)
Computer LM for CASchools
> summary(modc)
Call:
lm(formula = computer ~ read + math, data = CASchools)
Residuals:
Min 1Q Median 3Q Max
-525.96 -242.85 -126.19 58.38 3004.72
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 908.594 737.316 1.232 0.218531
read -11.636 2.731 -4.260 0.0000252 ***
math 10.739 2.928 3.667 0.000277 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 432.8 on 417 degrees of freedom
Multiple R-squared: 0.04275, Adjusted R-squared: 0.03816
F-statistic: 9.312 on 2 and 417 DF, p-value: 0.0001105
Linear model for Boston data set
data("Boston")
?Boston
mod1 <- lm(medv ~ ., data = Boston)
summary(mod1)
Boston Schools LM
> summary(mod1)
Call:
lm(formula = medv ~ ., data = Boston)
Residuals:
Min 1Q Median 3Q Max
-15.595 -2.730 -0.518 1.777 26.199
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 36.4594884 5.1034588 7.144 0.000000000003283 ***
crim -0.1080114 0.0328650 -3.287 0.001087 **
zn 0.0464205 0.0137275 3.382 0.000778 ***
indus 0.0205586 0.0614957 0.334 0.738288
chas 2.6867338 0.8615798 3.118 0.001925 **
nox -17.7666112 3.8197437 -4.651 0.000004245643808 ***
rm 3.8098652 0.4179253 9.116 < 0.0000000000000002 ***
age 0.0006922 0.0132098 0.052 0.958229
dis -1.4755668 0.1994547 -7.398 0.000000000000601 ***
rad 0.3060495 0.0663464 4.613 0.000005070529023 ***
tax -0.0123346 0.0037605 -3.280 0.001112 **
ptratio -0.9527472 0.1308268 -7.283 0.000000000001309 ***
black 0.0093117 0.0026860 3.467 0.000573 ***
lstat -0.5247584 0.0507153 -10.347 < 0.0000000000000002 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.745 on 492 degrees of freedom
Multiple R-squared: 0.7406, Adjusted R-squared: 0.7338
F-statistic: 108.1 on 13 and 492 DF, p-value: < 0.00000000000000022