Boston vs California School Analysis

Analysis on data regarding Boston and California schools using Linear models in R.

Exploratory data analysis for this data was to compare values in the data sets. The data sets include school funding, resources, scores, teacher salaries and more.

Library, importing CASchools data summary of data.

library(MASS)
library(AER)
options(scipen = 100)

data("CASchools")
?CASchools

CASchools$STR <- CASchools$students/CASchools$teachers
CASchools$score <- (CASchools$read + CASchools$math)/2
summary(CASchools)
Summary CASchools Output
> summary(CASchools)
   district            school                  county      grades       students          teachers          calworks          lunch       
 Length:420         Length:420         Sonoma     : 29   KK-06: 61   Min.   :   81.0   Min.   :   4.85   Min.   : 0.000   Min.   :  0.00  
 Class :character   Class :character   Kern       : 27   KK-08:359   1st Qu.:  379.0   1st Qu.:  19.66   1st Qu.: 4.395   1st Qu.: 23.28  
 Mode  :character   Mode  :character   Los Angeles: 27               Median :  950.5   Median :  48.56   Median :10.520   Median : 41.75  
                                       Tulare     : 24               Mean   : 2628.8   Mean   : 129.07   Mean   :13.246   Mean   : 44.71  
                                       San Diego  : 21               3rd Qu.: 3008.0   3rd Qu.: 146.35   3rd Qu.:18.981   3rd Qu.: 66.86  
                                       Santa Clara: 20               Max.   :27176.0   Max.   :1429.00   Max.   :78.994   Max.   :100.00  
                                       (Other)    :272                                                                                    
    computer       expenditure       income          english            read            math            STR            score      
 Min.   :   0.0   Min.   :3926   Min.   : 5.335   Min.   : 0.000   Min.   :604.5   Min.   :605.4   Min.   :14.00   Min.   :605.5  
 1st Qu.:  46.0   1st Qu.:4906   1st Qu.:10.639   1st Qu.: 1.941   1st Qu.:640.4   1st Qu.:639.4   1st Qu.:18.58   1st Qu.:640.0  
 Median : 117.5   Median :5215   Median :13.728   Median : 8.778   Median :655.8   Median :652.5   Median :19.72   Median :654.5  
 Mean   : 303.4   Mean   :5312   Mean   :15.317   Mean   :15.768   Mean   :655.0   Mean   :653.3   Mean   :19.64   Mean   :654.2  
 3rd Qu.: 375.2   3rd Qu.:5601   3rd Qu.:17.629   3rd Qu.:22.970   3rd Qu.:668.7   3rd Qu.:665.9   3rd Qu.:20.87   3rd Qu.:666.7  
 Max.   :3324.0   Max.   :7712   Max.   :55.328   Max.   :85.540   Max.   :704.0   Max.   :709.5   Max.   :25.80   Max.   :706.8  

Linear model for Score in CASchools data set

mod <- lm(score ~ calworks+lunch+computer+expenditure+income+english+STR, data = CASchools)
summary(mod)
Score LM for CASchools
> summary(mod)

Call:
lm(formula = score ~ calworks + lunch + computer + expenditure + 
    income + english + STR, data = CASchools)

Residuals:
    Min      1Q  Median      3Q     Max 
-30.554  -5.382   0.192   5.037  27.966 

Coefficients:
               Estimate  Std. Error t value             Pr(>|t|)    
(Intercept) 661.8848393   9.0688496  72.984 < 0.0000000000000002 ***
calworks     -0.0917428   0.0577709  -1.588               0.1130    
lunch        -0.3709982   0.0362375 -10.238 < 0.0000000000000002 ***
computer      0.0004101   0.0010371   0.395               0.6927    
expenditure   0.0017438   0.0008854   1.970               0.0496 *  
income        0.6190124   0.0890025   6.955      0.0000000000139 ***
english      -0.2113803   0.0344657  -6.133      0.0000000020225 ***
STR          -0.2782033   0.2881301  -0.966               0.3348    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 8.42 on 412 degrees of freedom
Multiple R-squared:  0.808,	Adjusted R-squared:  0.8047 
F-statistic: 247.6 on 7 and 412 DF,  p-value: < 0.00000000000000022

Linear model for Engilsh in CASchools data set

modE <- lm(english ~ read+math, data = CASchools)
summary(modE)
English LM for CASchools
> summary(modE)

Call:
lm(formula = english ~ read + math, data = CASchools)

Residuals:
    Min      1Q  Median      3Q     Max 
-31.800  -8.781  -0.876   8.493  40.651 

Coefficients:
             Estimate Std. Error t value             Pr(>|t|)    
(Intercept) 386.63508   21.90028  17.654 < 0.0000000000000002 ***
read         -1.01487    0.08112 -12.510 < 0.0000000000000002 ***
math          0.44975    0.08698   5.171          0.000000363 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 12.86 on 417 degrees of freedom
Multiple R-squared:  0.508,	Adjusted R-squared:  0.5057 
F-statistic: 215.3 on 2 and 417 DF,  p-value: < 0.00000000000000022

Linear model for Computers in CASchools data set

modc <- lm(computer ~ read+math, data = CASchools)
summary(modc)
Computer LM for CASchools
> summary(modc)

Call:
lm(formula = computer ~ read + math, data = CASchools)

Residuals:
    Min      1Q  Median      3Q     Max 
-525.96 -242.85 -126.19   58.38 3004.72 

Coefficients:
            Estimate Std. Error t value  Pr(>|t|)    
(Intercept)  908.594    737.316   1.232  0.218531    
read         -11.636      2.731  -4.260 0.0000252 ***
math          10.739      2.928   3.667  0.000277 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 432.8 on 417 degrees of freedom
Multiple R-squared:  0.04275,	Adjusted R-squared:  0.03816 
F-statistic: 9.312 on 2 and 417 DF,  p-value: 0.0001105

Linear model for Boston data set

data("Boston")
?Boston

mod1 <- lm(medv ~ ., data = Boston)
summary(mod1)
Boston Schools LM

> summary(mod1)

Call:
lm(formula = medv ~ ., data = Boston)

Residuals:
    Min      1Q  Median      3Q     Max 
-15.595  -2.730  -0.518   1.777  26.199 

Coefficients:
               Estimate  Std. Error t value             Pr(>|t|)    
(Intercept)  36.4594884   5.1034588   7.144    0.000000000003283 ***
crim         -0.1080114   0.0328650  -3.287             0.001087 ** 
zn            0.0464205   0.0137275   3.382             0.000778 ***
indus         0.0205586   0.0614957   0.334             0.738288    
chas          2.6867338   0.8615798   3.118             0.001925 ** 
nox         -17.7666112   3.8197437  -4.651    0.000004245643808 ***
rm            3.8098652   0.4179253   9.116 < 0.0000000000000002 ***
age           0.0006922   0.0132098   0.052             0.958229    
dis          -1.4755668   0.1994547  -7.398    0.000000000000601 ***
rad           0.3060495   0.0663464   4.613    0.000005070529023 ***
tax          -0.0123346   0.0037605  -3.280             0.001112 ** 
ptratio      -0.9527472   0.1308268  -7.283    0.000000000001309 ***
black         0.0093117   0.0026860   3.467             0.000573 ***
lstat        -0.5247584   0.0507153 -10.347 < 0.0000000000000002 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.745 on 492 degrees of freedom
Multiple R-squared:  0.7406,	Adjusted R-squared:  0.7338 
F-statistic: 108.1 on 13 and 492 DF,  p-value: < 0.00000000000000022