Two-way ANOVA--do level of education and gender explain US income?

From WikiEducator
Jump to: navigation, search

This activity provides independent practice in use of two-way ANOVA within the context of the 4 steps of hypothesis testing:

  1. State the appropriate null and alternative hypotheses, Ho and Ha.
  2. Obtain a random sample, collect relevant data, and check whether the data meet the conditions under which the test can be used. If the conditions are met, summarize the data by a test statistic.
  3. Find the p-value of the test.
  4. Based on the p-value, decide whether or not the results are significant and draw your conclusions in context.[1]

Research question

The US Bureau of Labor Statistics uses large monthly and annual surveys of randomly selected US households to collect detailed information related to income and employment of US full-time workers. Using a random sample from the March 2002 annual survey we can study the relationship between background variables (age, education level, gender, and type of employment) and individual yearly total earnings.

There is much written and discussed about the relationship between education level and earnings (people with more education tend to have higher earnings) and gender and earnings (in the US, males tend to have higher earnings than females). One question we might ask is how education level and gender interact to explain earnings.

Description of variables

Observation number
Age in years
Education level (1 = "no high school", 2 = "some high school", 3 = "high school diploma", 4 = "some college", 5 = "bachelor's degree", 6 = "postgraduate degree")
Gender (1 = "male", 2 = "female" )
Individual's total yearly earnings (in US$); may be less than 0 in some cases
Classification of individual's main work experience (5 = "private sector", 6 = "government", 7 = "self-employed")


Obtain the dataset from one of the following:

  • class website: Workers.sav (SPSS file format--sampled, # of observations <1500)
  • "" via the Introduction to the Practice of Statistics[2] website, available for various statistical software packages (Note: N=71,076)


The following instructions and guiding questions will step you through the analysis process. Copy and paste the following section into a word processor. Provide responses as indicated.

factorial ANOVA to explain total earnings

  • What are the explanatory variables?
    • Factor A:
    • Factor B:
  • What is the response variable?
  1. State the null and alternative hypotheses being tested in this study (note there is more than one pair).
    • Ho:
    • Ha:
  2. Data collection and examination
    • Look at the data:
      • Calculate descriptive statistics. (SPSS: Select Analyze > Compare Means > Means...; assign each of the independent variables to a different next after you move the first into the independent variables window to get to the second layer.)
      • Create histograms for each combination of levels for the two explanatory variables. (SPSS: Use Chart Builder to set up the histogram. Select the Groups/Point ID tab and check columns panel variable and rows panel variable. Add one of the explanatory variables to the column panel and the second explanatory variable to the row panel. A two-way matrix of histograms will result.)
      • Create a chart with side-by-side boxplots for all of the factor A x factor B groups. (SPSS: Using Chart Builder select the Gallery tab, and then select the clustered boxplot option...the middle choice. Assign the dependent variable to the y-axis, an ordered independent variable to the x-axis, and the remaining independent variable to the "cluster on x" box. Once the graph is created, you will notice that the labeled outliers make for a very messy graph. You can turn these off in the chart editor: click on one of the labels, in the Properties dialog, select the Data Value Labels tab, select "Case number" in the top window and click the red X. Click Apply.)
    • Describe the data and shape of the distributions. Describe the comparison of distributions as displayed in the boxplot.
    • Explain why the conditions which allow us to safely use the ANOVA test are/are not met. (Continue with the analyses even if conditions are not met.)
    • Run the two-way ANOVA F test procedure (SPSS: Select Analyze > General Linear Model > Univariate. In the Univariate dialog, 1) select "Model..." and uncheck the include intercept in model (so the ANOVA table is easier to interpret); 2) select Plots...--move one explanatory variable to the Horizontal Axis window and the other explanatory variable to the Separate Lines window; and 3) select Save...--check unstandardized in the "Residuals" section.)
    • Create a Q-Q plot for the unstandardized residuals variable. Evaluate the plot as a second check on the Normality condition.
    • For each main effect and the interaction, report the value of the test statistic, its degrees of freedom and associated p-value.
  3. Interpret the analysis results in the context of the research question. Include important statistics from your analysis results to support your conclusion and generalize your results, if appropriate, to the relevant population(s). Address whether we can conclude that the independent variables offer a plausible cause of the response variable.


  1. Open Learning Initiative. Statistics. Retrieved from the Open Learning Initiative web site
  2. Moore, D.S., McCabe, G.P., and Craig, B.A. (2009). Introduction to the Practice of Statistics, 6th edition. New York: W.H. Freeman and Co.