Multiple regression--predicting achievement index score for elementary schools
This activity provides independent practice in use of multiple regression.
Contents
Research question
In a study to determine what factors are related to school performance, 400 California elementary schools were randomly sampled from California Department of Education's API dataset for the year 2000.[1] A number of measures related to school performance were collected, including a measure of academic performance: API 2000 (academic performance index, on a scale of 200 to 1000; a composite score indicating a school's overall academic performance, based on statewide testing) as well as other attributes of elementary schools thought to be related to school performance: class size, enrollment, percent of students receiving free lunch, etc.
Description of variables
Variable |
Description |
---|---|
snum |
School number |
dnum |
District number |
api00 |
API score for the year 2000 |
api99 |
API score for the year 1999 |
growth |
Change in API score from 1999 to 2000 |
meals |
Percent of students receiving free meals |
ell |
Number of students who are English language learners |
yr_rnd |
Year round school (0=No, 1=Yes) |
mobility |
Percent first year in school |
acs_k3 |
Average class size for grades K-3 |
acs_46 |
Average class size for grades 4-6 |
not_hsg |
Percent of parents who did not complete high school |
hsg |
Percent of parents whose highest education level is high school graduate |
some_coll |
Percent of parents whose highest education level is some college |
coll_grad |
Percent of parents whose highest education level is college graduate |
grad_sch |
Percent of parents whose highest education level is graduate school study |
avg_ed |
Average parent education (on a 1-5 scale, corresponding to levels in hsg to grad_sch variables) |
full |
Percent of teachers with a full teaching credential |
emer |
Percent of teachers with an emergency teaching credential |
enroll |
Number of students enrolled in the school |
mealcat |
Percent of students receiving free meals, grouped in 3 categories (1=0-46% free meals, 2=47-80% free meals, 3=81-100% free meals) |
collcat |
unknown |
Dataset
Obtain the dataset from one of the following:
- class website: elemapi2.sav (SPSS file format)
- via the UCLA Statistical Computing website: elemapi2.sav
Analyses
Response variable: api00
Explanatory variables: Choose at least 3 quantitative variables which you feel will most contribute to overall academic performance in elementary schools. Choose more if you wish, but dumping all of the possible variables into the prediction of api00 is inappropriate. Choose your variables BEFORE examination of descriptive statistics or correlations.
The following sections provide guiding questions to help step you through the process of multiple regression analysis. Copy and paste the following sections into a word processor. Create a summary or interpretation for each section as indicated.
Preliminary analyses
For all of the variables to be included in your regression analyses:
- Use SPSS to create descriptive statistics, frequency distributions (for variables with limited values), and histograms (for variables with many different values).
- Evaluate the results for reasonableness. Consider the following questions:
- Do any of the variables exhibit "suspicious" values?
- Do any of the distributions seem unreasonable given what you know about the measurement scale or appear "extreme" such that the variable would be unreasonable to use as a predictor in the regression?
- Evaluate the results for reasonableness. Consider the following questions:
- Use SPSS to create pairwise correlations and scatterplots (for each explanatory variable with the response, as well as for each pair of explanatory variables).
- Evaluate the results for reasonableness. Consider the following questions:
- Do the correlations of explanatory variables with api00 support their use in a prediction equation?
- How might the correlations among the explanatory variables impact the individual contribution of each?
- Is there a linear relationship between each explanatory variable and the response?
- Evaluate the results for reasonableness. Consider the following questions:
- Summarize the results of your evaluation of the preliminary analyses.
Full regression analysis
- Use SPSS to create a regression analysis, including all of your chosen explanatory variables.
- Evaluate the regression results, including the overall F test and contributions of each of the explanatory variables. Consider the following questions:
- Do the explanatory variables (as a group) significantly predict the response variable?
- Do each of the explanatory variables contribute to the prediction of the response beyond the contribution of the other variables?
- Does the collection of explanatory variables provide a "useful/practical" explanation of the response?
- What refinements, if any, will you make?
- Evaluate the regression results, including the overall F test and contributions of each of the explanatory variables. Consider the following questions:
- Summarize the results of your evaluation of the regression analysis output.
Refine the model
- Decide on which variable to delete from the model.
- Use SPSS to create a second regression analysis with the remaining explanatory variables entered.
- Evaluate the regression results. Consider the following questions:
- Do the remaining explanatory variables (as a group) significantly predict the response variable?
- Do each of the explanatory variables contribute to the prediction of the response beyond the contribution of the other variables?
- Does the collection of explanatory variables provide a "useful/practical" explanation of the response?
- Is the reduction in "explanation power" due to the removal of one explanatory variable acceptable?
- What additional refinements, if any, will you make?
- Evaluate the regression results. Consider the following questions:
- Summarize the results of your evaluation of the regression analysis output.
- Repeat this step if there are additional variables to be removed from the model.
Residuals
- Use SPSS to specify a final run of your refined model. Choose the "Save" option and under "Predicted Values" select "Unstandardized" and under "Residuals" select "Unstandardized". Run the regression as before. Two new variables will be created in your dataset containing the unstandardized predicted value and the unstandardized residual for each observation. You will use these variables to create plots to study the residuals.
- Create a normal quartile plot (Q-Q plot) of the the unstandardized residuals.
- Use the plot to evaluate the assumption that the errors (residuals) are normally distributed.
- Create plots of the unstandardized residuals versus the unstandardized predicted values and each of the explanatory variables entered in the model.
- Evaluate each plot. Consider the following questions:
- Are the residuals more or less randomly dispersed around zero?
- Is there any evidence to suggest that the explanatory and response variables have a non-linear relationship?
- Is there any evidence that the errors do not have a common standard deviation?
- Are there any unusual patterns?
- Evaluate each plot. Consider the following questions:
- Summarize the results of your evaluation of the residuals.
Conclusion
- Interpret the results of your regression analyses in the context of the research question. Be sure to include:
- A description of the model, including regression equation and what the equation suggests about the relationship among the variables.
- To what population the model applies
- Results of significance testing
- The model's usefulness (explanatory power)
- Variables rejected from the model and why
- Describe any limitations to your study, e.g.,
- Issues with generalization (both to a population as well as limitations of any proxy variables--indicators of other less measurable characteristics)
- Model specification
- Regression assumptions
Resources
Regression with SPSS, by Xiao Chen, Phil Ender, Michael Mitchell and Christine Wells (in alphabetical order), provides helpful guidance in analyzing the "elemapi" datasets.
References
- ↑ Chen, X., Ender, P., Mitchell, M. and Wells, C. (2003). Regression with SPSS, from http://www.ats.ucla.edu/stat/spss/webbooks/reg/default.htm .