Comparing means for two independent samples--math ability of male and female CS students

From WikiEducator
Jump to: navigation, search


This activity offers students direct experience with the 4 steps involved in hypothesis testing for two means from independent samples:

  1. State the appropriate null and alternative hypotheses, Ho and Ha.
  2. Obtain a random sample, collect relevant data, and check whether the data meet the conditions under which the test can be used. If the conditions are met, summarize the data by a test statistic.
  3. Find the p-value of the test.
  4. Based on the p-value, decide whether or not the results are significant and draw your conclusions in context.[1]

Inference for the mean of a population

Use this activity for in-class collaborative group work.

Estimate for completion time: 45 minutes

Materials needed:

  • 4-step hypothesis testing template (shown below) for each group (handout, in .odt file format--OpenOffice.org Writer)
  • Analysis software (SPSS, PPSP, SAS, R, Minitab, Excel, Calc)
  • Dataset: csdata.por (portable file format)



Icon activity.jpg

Activity

Comparing math ability for male and female computer science students

A study of freshman computer science majors, designed to investigate why students intending to major in computer science failed to do so, collected data on 224 beginning computer science majors in a particular year.[2] The resulting dataset includes 8 variables.

  • OBS: ID number
  • GPA: The 3-semester grade-point average (0-4 scale)
  • HSM: average high school grade in math (1-10 scale, with 10=A, 9=A-, etc.)
  • HSS: average high school grade in science (1-10 scale, with 10=A, 9=A-, etc.)
  • HSE: average high school grade in English (1-10 scale, with 10=A, 9=A-, etc.)
  • SATM: SAT Mathematics score
  • SATV: SAT Verbal score
  • SEX: 1=male; 2=female

The researchers might have been interested in how males and females compared on the SAT Math score (SATM). There is no reason to suspect that males would perform better than females, or vice versa.


Design and implement hypothesis test(s)

Form students into groups of 2-4 students. Each group will need access to a laptop with statistical software loaded and a copy of the handout. Have the students complete the handout as a group, which includes the following information.

Identify the following:

  • Explanatory variable:
  • Response variable
  1. State the appropriate null and alternative hypotheses and set the significance level.
    Ho:
    Ha:
    Significance level:
    • In words, clearly state what your random variable, X-bar, represents.
    • State what test statistic will be used to summarize the data. Indicate whether a one-tailed or two-tailed test will be used.
  2. Open the dataset, csdata.por, into the statistical software. Check whether the data meet the conditions under which the test can be used. If the conditions are met, summarize the data by a test statistic.
    • Calculate summary statistics and create a histogram (or stemplot) based on the sample data.
    • Confirm that the conditions for use of the chosen test statistic have been met. (Continue even if the conditions are not met, and be ready to discuss noted violations in follow-up.)
    • Calculate the test statistic.
  3. Find the p-value of the test.
    p-value:
    • Explain what the p-value means.
    • On a sketch of the normal distribution, label the x axis and shade the region(s) corresponding to the p-value
  4. Based on the p-value, decide whether or not the results are significant and draw your conclusions in context.
    • Indicate whether or not Ho is rejected.
    • Provide a reason for this decision.
    • Draw conclusions based on the results, given the context of the scenario.
    • If Ho is rejected, create a confidence interval appropriate to the given significance level and interpret this interval in the context of the research question.


Follow-up discussion

  • Review the results.
    • Were the conditions met?
    • Were the test results significant?
    • What can we conclude about our research question based on the results.
  • Are there limitations to our study?
    • Sample -- is the sample an SRS?
    • Generalizing to population -- Students in sample were all computer science majors who began college during a particular year. How do the results generalize to a) college students in general, b) computer science majors at other colleges, and c) computer science students entering college in other years?
    • other limitations?


Resources

The following resources were used for ideas and organization in the development of this activity:

  • Dean, S., & Illowsky, B. (2009, February 18). Hypothesis Testing for Two Means and Two Proportions: Lab. Retrieved from the Connexions Web site on 19 Sep 2010.

References

  1. Open Learning Initiative. Statistics. Retrieved from the Open Learning Initiative web site http://oli.web.cmu.edu/openlearning/forstudents/freecourses/statistics.
  2. Campbell, P.F. and McCabe, G.P. (1984). "Predicting the success of freshman in a computer science major." Communications of the ACM, pp. 1108-1113.