Multiple regression--predicting achievement index score for elementary schools

This activity provides independent practice in use of multiple regression.

Research question

In a study to determine what factors are related to school performance, 400 California elementary schools were randomly sampled from California Department of Education's API dataset for the year 2000.^[1] A number of measures related to school performance were collected, including a measure of academic performance: API 2000 (academic performance index, on a scale of 200 to 1000; a composite score indicating a school's overall academic performance, based on statewide testing) as well as other attributes of elementary schools thought to be related to school performance: class size, enrollment, percent of students receiving free lunch, etc.

Description of variables

Variable	Description
snum	School number
dnum	District number
api00	API score for the year 2000
api99	API score for the year 1999
growth	Change in API score from 1999 to 2000
meals	Percent of students receiving free meals
ell	Number of students who are English language learners
yr_rnd	Year round school (0=No, 1=Yes)
mobility	Percent first year in school
acs_k3	Average class size for grades K-3
acs_46	Average class size for grades 4-6
not_hsg	Percent of parents who did not complete high school
hsg	Percent of parents whose highest education level is high school graduate
some_coll	Percent of parents whose highest education level is some college
coll_grad	Percent of parents whose highest education level is college graduate
grad_sch	Percent of parents whose highest education level is graduate school study
avg_ed	Average parent education (on a 1-5 scale, corresponding to levels in hsg to grad_sch variables)
full	Percent of teachers with a full teaching credential
emer	Percent of teachers with an emergency teaching credential
enroll	Number of students enrolled in the school
mealcat	Percent of students receiving free meals, grouped in 3 categories (1=0-46% free meals, 2=47-80% free meals, 3=81-100% free meals)
collcat	unknown

Dataset

Obtain the dataset from one of the following:

class website: elemapi2.sav (SPSS file format)
via the UCLA Statistical Computing website: elemapi2.sav

Analyses

Response variable: api00

Explanatory variables: Choose at least 3 quantitative variables which you feel will most contribute to overall academic performance in elementary schools. Choose more if you wish, but dumping all of the possible variables into the prediction of api00 is inappropriate. Choose your variables BEFORE examination of descriptive statistics or correlations.

The following sections provide guiding questions to help step you through the process of multiple regression analysis. Copy and paste the following sections into a word processor. Create a summary or interpretation for each section as indicated.

Preliminary analyses

For all of the variables to be included in your regression analyses:

Use SPSS to create descriptive statistics, frequency distributions (for variables with limited values), and histograms (for variables with many different values).
- Evaluate the results for reasonableness. Consider the following questions:
  - Do any of the variables exhibit "suspicious" values?
  - Do any of the distributions seem unreasonable given what you know about the measurement scale or appear "extreme" such that the variable would be unreasonable to use as a predictor in the regression?
Use SPSS to create pairwise correlations and scatterplots (for each explanatory variable with the response, as well as for each pair of explanatory variables).
- Evaluate the results for reasonableness. Consider the following questions:
  - Do the correlations of explanatory variables with api00 support their use in a prediction equation?
  - How might the correlations among the explanatory variables impact the individual contribution of each?
  - Is there a linear relationship between each explanatory variable and the response?
Summarize the results of your evaluation of the preliminary analyses.

Full regression analysis

Use SPSS to create a regression analysis, including all of your chosen explanatory variables.
- Evaluate the regression results, including the overall F test and contributions of each of the explanatory variables. Consider the following questions:
  - Do the explanatory variables (as a group) significantly predict the response variable?
  - Do each of the explanatory variables contribute to the prediction of the response beyond the contribution of the other variables?
  - Does the collection of explanatory variables provide a "useful/practical" explanation of the response?
  - What refinements, if any, will you make?
Summarize the results of your evaluation of the regression analysis output.

Refine the model

Decide on which variable to delete from the model.
Use SPSS to create a second regression analysis with the remaining explanatory variables entered.
- Evaluate the regression results. Consider the following questions:
  - Do the remaining explanatory variables (as a group) significantly predict the response variable?
  - Do each of the explanatory variables contribute to the prediction of the response beyond the contribution of the other variables?
  - Does the collection of explanatory variables provide a "useful/practical" explanation of the response?
  - Is the reduction in "explanation power" due to the removal of one explanatory variable acceptable?
  - What additional refinements, if any, will you make?
Summarize the results of your evaluation of the regression analysis output.
Repeat this step if there are additional variables to be removed from the model.

Residuals

Use SPSS to specify a final run of your refined model. Choose the "Save" option and under "Predicted Values" select "Unstandardized" and under "Residuals" select "Unstandardized". Run the regression as before. Two new variables will be created in your dataset containing the unstandardized predicted value and the unstandardized residual for each observation. You will use these variables to create plots to study the residuals.
Create a normal quartile plot (Q-Q plot) of the the unstandardized residuals.
- Use the plot to evaluate the assumption that the errors (residuals) are normally distributed.
Create plots of the unstandardized residuals versus the unstandardized predicted values and each of the explanatory variables entered in the model.
- Evaluate each plot. Consider the following questions:
  - Are the residuals more or less randomly dispersed around zero?
  - Is there any evidence to suggest that the explanatory and response variables have a non-linear relationship?
  - Is there any evidence that the errors do not have a common standard deviation?
  - Are there any unusual patterns?
Summarize the results of your evaluation of the residuals.

Conclusion

Interpret the results of your regression analyses in the context of the research question. Be sure to include:
- A description of the model, including regression equation and what the equation suggests about the relationship among the variables.
- To what population the model applies
- Results of significance testing
- The model's usefulness (explanatory power)
- Variables rejected from the model and why
Describe any limitations to your study, e.g.,
- Issues with generalization (both to a population as well as limitations of any proxy variables--indicators of other less measurable characteristics)
- Model specification
- Regression assumptions

Resources

Regression with SPSS, by Xiao Chen, Phil Ender, Michael Mitchell and Christine Wells (in alphabetical order), provides helpful guidance in analyzing the "elemapi" datasets.

References

↑ Chen, X., Ender, P., Mitchell, M. and Wells, C. (2003). Regression with SPSS, from http://www.ats.ucla.edu/stat/spss/webbooks/reg/default.htm .

[1] Chen, X., Ender, P., Mitchell, M. and Wells, C. (2003). Regression with SPSS, from http://www.ats.ucla.edu/stat/spss/webbooks/reg/default.htm .

[1]

Multiple regression--predicting achievement index score for elementary schools

Contents

Research question

Description of variables

Dataset

Analyses

Preliminary analyses

Full regression analysis

Refine the model

Residuals

Conclusion

Resources

References

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Community

Print/export

Tools