GSE Stat Methods II - Review Notes

The following review is based on the indicated chapter and section of

Moore, D. S., McCabe, G. P., & Craig, B. A. (2009). Introduction to the practice of statistics (6th ed). New York: W. H. Freeman.

The questions and items for display are organized as a slide show. The sub-bullets for each point support discussion and content to be written out on the board.

Two-Way Analysis of Variance (13.1/13/2)

What is the explanatory variable in a two-way ANOVA?
- two categorical variables
- The categorical variables are called factors. (Factor A and Factor B)
What populations are we making inference about?
- The populations created by crossing Factor A and Factor B classifications
What is the response variable in a two-way ANOVA?
- Response: one quantitative variable
Examples of two-way ANOVAs
- students devise an example with a partner, each pair presents their example
- draw at least one of the examples as a two-way table to create cells
- demonstrate how the number of levels in each factor is used to calculate # of cells.
What are the advantages of a two-way ANOVA?

(use one of the examples to show the following)
- studying two factors simultaneously is more efficient.
  
  (assign n's to marginals in example to show how one set of data can provide information for both factors)
- Including a second factor may reduce the residual variation.
  
  DATA = FIT + RESIDUAL
  
  DATA = sum of differences in each score from overall mean
  
  FIT = sum of differences in each cell mean from overall mean
  
  RESIDUAL = sum of differences in each score from cell mean.
  
  including the second factor may result in a better fit of the data in a cell with the cell mean, resulting in less residual variation--smaller MSE.
- Interaction between the factors can be investigated.
  
  Introduce idea of main effect: effect on response variable of differing levels of one factor pooled across the levels of the other factor; comparable to one-way ANOVA.
  
  a main effect for each factor
  
  Interaction: results are not predicted by knowing main effects. These cannot be studied using two one-way ANOVAs.
Interaction example:
- Two Factors--Exercise (regular, program) and diet (regular, special)
- Response--Cholesterol level
What main effects might we predict?
- exercise program lowers cholesterol
- special diet lowers cholesterol
What interaction might result?
- Cholesterol level receives main effect for exercise, plus main effect for diet, plus interaction effect (an interaction implies the effect of one variable differs depending on the level of another variable)
- It could be that doing both the special diet and exercise program lowers the cholesterol level even more than would be expected given the two main effects.
If we have a two-way table of means, where are the main effects?
- marginals....marginal means
How are the number of levels in each factor used to describe the model?
- Factor A: 3 levels, Factor B: 4 levels --> 3x4 ANOVA that includes 12 cells
What are the possible outcomes for a two way ANOVA?
- Display a table similar to the following, along with plots to represent each column:

Sign of Main Effects	Neither	Neither	1 factor	Both factors	Both factors
Sign of Interaction	None	Significant	None	Signficant	None

What is the best course of action if an interaction is significant?
- Study the plot of means for each level of each factor.
- The main effects may or may not be informative.
What are the hypotheses that we use with a two-way ANOVA?
- Null:
  - There is no main effect due to Factor A
  - There is no main effect due to Factor B
  - There is no interaction
- Alternative:
  - The null hypotheses negated
What are the conditions for safe use of two-way ANOVA?

same as for one-way ANOVA, given additional factor
- The samples drawn from each of the Factor A x Factor B populations are independent.
- Each of the populations...
  - must be normally distributed (in addition to separate histograms or normal quantile plots for each, we can also look to see if residuals are normally distributed.)
  - have the same standard deviation
Must sample sizes be the same for all of the cells?
- No
- Balanced design (equal sample sizes) has some advantages, but it's not necessary.
How do we partition the sums of squares and degrees of freedom in a one-way ANOVA?
- SST = SSG + SSE
- DFT = DFT + DFE
How do we partition the sums of squares and degrees of freedom in a two-way ANOVA?
- SST = (SSA + SSB + SSAB) + SSE
- DFT = (DFA + DFB + DFAB) + DFE
- note that when the n's for each cell are not all the same, some methods will give sums of squares that do not add up
ANOVA table
- Display the ANOVA table in IPS6e, p. 695
- Review each element
Example: Reconstructing Chess Boards (source: Is competence and position of board related to short term memory for reconstructing chess boards)
- Explanatory Variables:(A) Chess ability: novice, average, good; (B) Pattern on chess board: random, real
- Response Variable: Reconstruction score
What are the research questions?
- Chess ability: are better players able to better reconstruct boards?
- Chess board layout: are real chess boards easier to reconstruct than random boards?
- Ability x board layout interaction: does the effect of ability differ depending on the chess board layout?
How do we set up the dataset in SPSS?
- Display two-way matrix with response values for each cell.
- Display SPSS screenshots showing dataset, and category values for ability and layout.
How do we run a two-way ANOVA in SPSS?
- Select Analyze > General Linear Model > Univariate
- Display screenshot of Univariate dialog box to show allocation of variables
- Display dialog boxes for Model, Plots, Post Hoc and Options and discuss choices
In SPSS work through example with Chess.sav.
- hypotheses? (three pairs)
- Look at the data...for each cell
  - descriptive stats
  - histograms
  - side-by-side boxplots
- Describe the data.
- Check that conditions are met.
- Run the ANOVA in GLM...create a plot of means, save residuals. Interpret results.
- Create Q-Q plot of residuals, as a final check.

Comparing means (12.2)

If the one-way ANOVA F test is significant, what can we conclude when we reject the null hypothesis?
- That not all of the μ's are equal.
What else would we like to know?
- Which means are different from which other ones.
- Of course we can look at the side-by-side boxplots to gain some insight, but it won't tell us which differences are significant.
How could we have attended to this issue when we designed the study?
- Included designs for planned (a priori) comparisons.
- IPS6e uses the term "Contrast" to refer to these planned comparisons.

Planned comparisons (contrasts)

Are planned comparisons dependent on the results of the ANOVA F test?
- No, in fact planned comparisons can be run with or without a preceding F test, and whether or not an F test is significant.
Example: High School and Beyond study. Previously we've looked at how well the math score predicted the science score. Now let's consider how students in three different programs differ in their science scores: general, academic preparatory, vocational/technical. Before we look at the data, what comparison(s) might be interesting to design ahead of time.
1. Is mean science score for general different from vo/tech?
2. Is mean science score for academic prep greater than average of general and vo/tech?
Notice that each comparison is a contrast of two things....we will use the idea of linear combinations to create each contrast. What does the first contrast look like? (Let's write it in terms of the population hypotheses.)
- Ho: μ_G = μ_VT; Ha: μ_G ≠ μ_VT
- alternatively we can say, Ho: μ_G - μ_VT = 0; Ha: μ_G - μ_VT ≠ 0
Let's assign coefficients (denoted a_group) to each of the means. What coefficients are implicit in our Ho and Ha statements?
- Ho: (1)μ_G + (-1)μ_VT = 0; Ha: (1)μ_G + (-1)μ_VT ≠ 0
What linear combination can we create?
- c₁ = (1)x-bar_G + (-1)x-bar_VT + (0)x-bar_AP
How can we write the null and alternative hypotheses for the second contrast?
- Ho: μ_AP - 1/2[μ_VT + μ_G] = 0; Ha: μ_AP - 1/2[μ_VT + μ_G] > 0
What linear combination can we create?
- c₂ = (1)x-bar_AP + (-.5)x-bar_VT + (-.5)x-bar_G
How do we test a contrast? (Note that the linear combination boils down to a difference...between two groups.)
- Using a t test.
What is the general form of the t test?
- t = (estimate - null value)/SE(estimate)
- In this situation, t = (contrast - 0)/SE_c
What is SE_c?
- a measure of the variability due to sampling of c
- We won't concern ourselves with understanding SE_c, except to say it is based on MSE (from the ANOVA), the n in each group and the assigned coefficients.
What are the degrees of freedom for the t test?
- DFE (N - k)
Specifying a contrast in SPSS....
- Display SPSS contrast dialog box showing coefficients for c₂
Quick look at the data...
- Display side-by-side boxplots for example
Interpret results of contrast...
- Display SPSS output c₁ and c₂
Interpret results of ANOVA...
- Display SPSS output for ANOVA
- Can we say anything about causation? (No)
- What would we have had to do to suggest causation? (randomly assign to groups)
A few more points about contrasts
- collection of coefficients (a's) should sum to 0
- more powerful* than multiple comparisons...will understand that better soon
- can be one or two sided
- can create a confidence interval for difference value (c+/-t*SE_c)
- not all software packages include functionality to do a contrast

Post-hoc analyses & multiple comparisons

What if you didn't have any idea about comparisons before looking at the data, but now that you have a significant F test, you'd like to better understand the differences in the means. What kind of analyses can you run?
- unplanned comparisons...also called post-hoc and a posteriori analyses
Often this process involves many pairwise analyses. What's wrong with running multiple t tests on these as we did with planned contrasts?
- the Type I error rate, experimentwise--across all of the analyses, will be larger than α
What is a Type I error? How often do we make Type I errors?
- rejecting the null hypothesis when in fact it's true.
- Draw normal distribution and shade an area in each tail which together represent the amount α.
What are the two kinds of Type I error rates that we need to be concerned with when making comparisons? Can both of these be set to alpha?
- per comparison Type I error rate
- experimentwise Type I error rate
- with 2 or more comparisons, there is no way to keep both per comparison and experimentwise Type I error rates equal to α
If the per comparison Type I error rate = α, why is it that the experimentwise Type I error rate becomes larger than α?
- It goes back to probability...what is the probability that at least one of the comparisons results in a Type I error?
If there are two comparisons, how many ways are there to have an error?
- c₁, c₂ or both, so probability of at least one is higher.
- Draw two way probability chart

	(c1) .95	(c1) .05
(c2) .95	.9025	.0475
(c2) .05	.0475	.025

Display table of experimentwise probabilities

There are lots of ways to control the experimentwise Type I error rate. We will discuss only one method...Bonferroni.
Let's say you had 3 groups, so there are 3 pairwise comparisons that we could make. What's one way we could control the experimentwise Type I error rate?
- use a much smaller alpha for the test of each comparison
- in fact we could use α divided by the number of comparisons
- in our example with 3 comparisons, the α_E = .0167
If we wanted to keep α=.05, how could we adjust the p-value to account for the multiple comparisons?
- multiply the p-value by the number of comparisons
What are the statistic that we will calculate for each pairwise comparison?
- t test
What is the null and alternative hyptheses?
- Ho: μ₁ - μ₂ = 0; Ha: μ₁ - μ₂ ≠ 0
Can we use either a one or two-sided Ha?
- NO, because these are unplanned, it's not reasonable to set a direction based on the results
How do we run this in SPSS?
- Display statistics dialog box with Bonferroni selected
How do we interpret the output?
- Display SPSS output for program type and science score
- Interpret all of the comparisons, noting the duplication in the table
Would it be reasonable to plan for analysis of all pairwise comparisons so you don't have to run the more conservative Bonferroni comparisons?
- No. The more comparisons you test, the more likely you will be to falsely reject Ho, even if they are planned.
- This is a judgment call.
- See multiple comparisons section in onlinestatbook.

Inference for one-way ANOVA (12.1)

Review. What explanatory and response variables are used in a comparison of means in two groups (populations)?
- explanatory variable is categorical with two values, corresponding to two population groups
- response variable is quantitative, from which means for each group are calculated
- note that when we talk about groups, we are referring to populations
For what designs do we use this framework?
- independent samples
- matched pairs (paired samples, dependent samples), to some extent
Now let's look at the situation in which there are more than two groups in the explanatory variable. How is this similar to the two group situation?
- explanatory variable is categorical, but with more than two values, with each value corresponding to a population group
- response variable is quantitative, from which means for each group are calculated
- Display overview of design.
Note that we will only consider the case of independent samples. What do we call the extension of matched pairs to more than one group?
- repeated measures
What test statistic did we use to summarize the difference between means for two independent groups?
- t test
The structure of the t test only applies to two groups. What framework can we use to study more than two groups?
- analysis of variation...DATA = FIT + RESIDUAL
- note that this discussion does NOT use the notation in IPS6e
How can we partition this variation?
- DATA = all of the values of the response variable (x_i's) & how they vary in comparison to the overall mean.
- FIT = the k means, $\bar{x}_1, \bar{x}_2, ..., \bar{x}_k$ & how they vary in comparison to the overall mean.
- RESIDUAL = the variation around the group means...the difference between each observation and its group mean
What test can we use to see how FIT compares with RESIDUAL...to see if the variation in group means is on average larger than the variation due to RESIDUAL?
- ANOVA F-test
Introduce Example: academic frustration and college major
Why do we call this method a *one-way* ANOVA?
- because there is only one way to classify the observations into groups...in our example the students are classified into groups according to "major".
- What happens if we classify the students based on major and gender....to create 8 groups? We now have two ways to classify the observations.
- How many ways do we classify observations in a crosstabulation? Two ways.
Before we get into the discussion of ANOVA, we MUST examine the data.
- Display histograms and descriptive statistics of frustration score for each college major
What are the null and alternative hypotheses for the ANOVA F test?
- Ho: Ho: μ₁ = μ₂ = ... = μ_k
- Ha: not all of the μ's are equal
What is one way the μ's could be unequal?
- any one or more could be different from others
Even though we have an inkling of how the ANOVA works, let's start at the beginning. How can we visually understand the difference in means?
- plot of comparison of means (line graph); display plot of frustration scores
- side-by-side boxplots for each major
What do the boxplots help us visualize?
- within-group variation
Which of the following sets of boxplots provides more convincing evidence that the population means differ?
- Display example from IPS6e, p. 640
- Note that in (a), the within-group variation overlaps one with the next; it could be that these three boxplots represent sample variation from one common population.
How could boxplots be misleading?
- display median and quartiles, rather than mean and sd, but as we'll discuss in the assumptions/conditions we expect the data in each group to be Normal, so these two measures of center will be reasonably close.
Let's regroup. What is the question we are trying to answer?
- We want to know whether the differences among the sample means is due to true differences in the population means (Ha) or merely due to sampling variability (Ho).
What can we use to evaluate the differences among the sample means?
- FIT compared to RESIDUAL
What is FIT?
- variation due to the k sample means...variation among the sample means
- we called this variation due to the model in regression, we adjust this to variation due to groups for one-way ANOVA
What is RESIDUAL?
- variation due to the individual observations as compared to their group mean...variation within groups.
What do we use to summarize the comparison of FIT and RESIDUAL?
- $F = \frac {variation\ among\ sample\ means}{variation\ within\ groups}$
What do we know about the F statistic?
- a family of distributions
- has two degrees of freedom values for each distribution
- distributed as an F(DFG,DFE) distribution when the null hypothesis is true.
What are the conditions under which we can safely use the F statistic?
- The samples drawn from each of the k populations are independent; an SRS from each group ensures this.
- Each of the k populations...
  - must be normally distributed.
  - have the same standard deviation.
How do we assess that the response variable varies normally in each of the k populations?
- study histograms of the samples for evidence of skewness and outliers.
- Large sample sizes will mitigate need to have normal distributions as a result of the central limit theorem.
How do we assess that the k populations all have the same standard deviation?
- best we can do is evaluate if sample standard deviations are similar.
- A common rule of thumb is that F-test is approx. correct when the ratio between the largest sample standard deviation and the smallest is less than 2.
Evaluation of conditions for frustration example.
- the 4 samples were chosen randomly, so observations are independent
- samples size for each group is 35, so don't need to worry about normality--but we did see that they were approx normal.
- Display descriptives. Show that largest sd/smallest sd = 3.1/2.1 < 2
What will SPSS produce when we run the ANOVA?
- ANOVA table
- Display table and review each element, note calculation of DF
- $SSG\ = \sum_{groups} n_g(\bar{y}_g - \bar{y})^2$
- $SSE\ = \sum_{groups} (n_g-1)s_g^2 = \sum_{groups} (y_{ig} - \bar{y}_g)^2$
- $SST\ = \sum_{obs} (y_{ig} - \bar{y})^2$
How does our framework DATA = FIT + RESIDUAL work with the sources of variation?
- Total = Between groups + within groups (error)
How are the sums of squares related?
- SST = SSG + SSE
How are the degrees of freedom related?
- DFT = DFG + DFE
What is the coefficient of determination?
- Just another name for R², which has the same interpretation
Display ANOVA table for frustration example
- Explain each part
What can we conclude based on the ANOVA F test?
- The F statistic is highly significant, so not all of the population means (μ's) are equal.
How do we determine what is going on?
- Display boxplot of frustration scores by major
- Note that business is clearly different, but others may be different, we can't know.
Why would it be wrong to do all possible pairwise t-tests?
- because 5% of the time we'd get a significant result when in fact null is true.
- the next section of chapter 12 will present methods for comparing the means.
How to run a one-way ANOVA in SPSS.
- select Analyze > Compare Means > One-Way ANOVA.
- Display screenshot of dialog box and walk through how to allocate variables.
- Display Options dialog.

Multiple regression (11.1/11.2)

How is multiple regression different from simple linear regression?
- more than one explanatory variable
- many situations in which we can use knowledge of more than one explanatory variable to obtain a better understanding and better prediction of a particular response (e.g., low birth weight babies)
- exp var's generally quantitative, but can be categorical, e.g. dichotomous
- we will use i to denote data observations, from 1 to n, and j to denote number of explanatory variables, from 1 to p...DRAW data matrix
How do we integrate the additional explanatory variables to create a model for an individual response in the population? i.e., a statistical model for multiple linear regression
- DATA = FIT + RESIDUAL
- $y_i = \beta_0\ +\ \beta_1 x_{i1} + \beta_2 x_{i2} + ... + \beta_p x_{ip} + \epsilon_i$
- we use a set of explanatory variables to predict response y.
How do we predict the mean response, μ_y, given a set of explanatory variables?
- $\mu_y = \beta_0\ +\ \beta_1 x_1 + \beta_2 x_2 + ... + \beta_p x_p$
- notice that the error term dropped out. Where did it go? ...we don't need anything more than the linear equation to predict the mean
What assumptions must we make about the error term, ε_i?
- ε_i are independent
- ε_i are distributed N(0,σ)--note this is a common σ not dependent on value of x
- same as for simple linear regression (also relationship is linear)
In practice, we have to estimate the population parameters. What do we use to estimate the regression coefficients β?
- least squares estimation
- determine the set of estimates that minimizes the sum of squared difference between the observed and predicted scores, $\sum (y_i - \hat{y})^2$ .
- no other set of regression coefficients will give a smaller SSE.
What is the regression prediction equation that results?
- $\hat{y} = b_0\ +\ b_1 x_{i1} + b_2 x_{i2} + ... + b_p x_{ip}$
What is the residual that results?
- $e_i = y_i - \hat{y}_i$
- it is the sum of these e_i's which is minimized in the least squares method.
In multiple regression, what is b₀?
- estimate of β₀
- the response score we would expect when the values of the explanatory variables are all zero (i.e., $x_1 = x_2 = ... = x_p = 0$ ).
- it is still the y-intercept...in a multidimensional plot including all of the expl. var's.
What does b_j mean? But first, what does j represent?
- estimate of β_j
- the increase in the response variable for every unit increase in predictor x_j given other variables remain constant.
Describe example...CSDATA
But before we go anywhere with this example, we MUST take a look at the data. What should we look at?
- Display descriptive statistics...
  - GPA min=.12 looks suspicious
  - SATV min=285 doesn't jive with SAT reported scores rounded to 10s.
Extreme values of any variable should be noted and checked for accuracy.
- Display graph for any variable with suspicious values
- Note that distributions do not need to be normal...skewness is OK.
Review the relationships between pairs of variables using correlations and scatterplots
- Display correlation matrix...note correlations among explanatory variables
- Display scatterplot of HSM vs. GPA.
- It is useful to study scatterplots of all pairs of variables to be included in regression model...it may be that two explanatory variables are related, such that only one is needed in the model.
Start with a subset of variables (high school grades predict GPA)....display regression equation
- Those who scored zero in HSM, HSS, and HSE are expected to have a GPA of .590. (Not very meaningful)
- Keeping the HSS and HSE scores constant, a one-point increase in HSM corresponds to a 0.169 increase in GPA.
- Similarly keeping HSM and HSE constant, a one-point increase in HSS corresponds to a 0.034 increase in GPA.
- Similarly for HSE...
- Be careful because interpretation of individual contribution of variables is very complicated. There is a whole course on it.
How do we know if the regression coefficients are helping to predict the response variable at all?
- We use the F test, and the ANOVA table
What hypotheses are tested using the F test?
- Ho: β₁ = β₂ = ... = β_p = 0 (all are not predictive)
- Ha: at least one of the β_j is not 0
How do we calculate the F value in multiple regression?
- Same as in simple linear regression.
- F=MSM/MSE
What are the degrees of freedom for F(DFM, DFE)?
- DFM = p (number of explanatory variables)
- DFE = n-p-1 (compare to n-2 for one explan variable)
- Display CSDATA ANOVA, note that F is significant
- We reject Ho and conclude that at least one of the b values is not 0, not clear if all three are useful or not.
A thorough answer to which variables are important requires more advanced study. What can we look at to begin to understand the relationship of the explanatory variables with the response variable?
- Table of coefficients in SPSS output.
- Ask what each of the b parameters means...
Based on the significant F test, we know that taken together HSM, HSS, and HSE are predictive of college GPA. How can we evaluate if all three are necessary?
- First, what do we mean by "necessary"?.....Necessary can be rephrased for each predictor “does this predictor provide additional information, given that other predictors are already in the model?”
- study the p-values of the t-tests in the coefficient table. (Why do we use a t test?)
- we notice that neither HSS or HSE are significant.
- Display correlation matrix.
- but how could that be, both variables correlate with college GPA? The t test indicates whether the coefficient provides statistically significant explanation of the response variable, in addition to the explanation provided by the other variables in the model.
Let's drop one of the variables. Which do you choose?
- HSS, because it has the largest p-value.
- Display new regression results.
- Note F significant
- Note comparison of regression coefficients for HSM (was .169) and HSE (was .045)
- HSE is again not significant. We could drop it....
The answer given by the coefficients table is only partial
- Optimal method of determining the best model (i.e., fewest number of predictors with relatively the same predictive power) involves more complex algorithm
- May involve factors other than statistical contribution (e.g., cost of obtaining certain variables)
What else should we be evaluating as we refine our model?
- Display plot of residuals...note that y-hat is plotted on the x axis as a way to represent both explanatory variables
- errors are evenly scattered around the 0 line.
- we would also study residual plots for each of the explanatory variables
- note that maximum predicted value is just over 3, there were lots of 4.0's in the data...model less than perfect.
We know our model is significant (in statistical terms), but we also want to ensure that it is useful. What can we use to study how effective our model is?
- R²
- In simple linear regression we could use both r and R², in multiple regression we have many correlations (expl with response and among explan)....no longer helpful to indicate overall effectiveness.
- R² is SSM/SST, proportion of variance in the response variable accounted for by the predictor variable(s)
What is the square root of R²?
- correlation of y_i and y-hat
How does R² compare for regression including HSM, HSS, and HSE compared to HSM and HSE?
- Display comparison of SPSS model tables (including only HSM)
- If we add a variable that is correlated with the response, we can expect an increase in R². But is the increase useful or negligible?
How well do the SAT variables predict college GPA?
- Significant, but not very useful...R²=.063
- Also note that SATV is not significant
What does the model look like if we enter all of the variables?
- Display output for all variables entered.
- Only HSM is significant
- Note that this is for Computer Science students....

Subtopic: Causation

Successful prediction does NOT require cause and effect
- Display xkcd.com correlation cartoon
When are we in danger of erroneously concluding causation?
- Whenever we are doing an observational study.
- Only way to establish direct causal link between two variables is to conduct a carefully designed experiment in which effects of possible lurking variables are controlled (i.e., random assignment to treatments).
How can we establish causation when we cannot randomly assign subjects to conditions (e.g., studying effects of smoking)?
- Many, many, many studies, each undertaken in under different conditions, and all/nearly all telling a similar story.
How can we model causation?
- Display causation models from chapt 2 (causation, common response and confounding)
- How would we model our prediction of college GPA for computer science students?
Newsweek article on discussion board
- "All too many put too much credence in observational studies, in which people who happen to behave one way (eating a lot of olive oil, drinking in moderation) have one health outcome, while people who choose to behave the opposite way have a different health outcome."
Display The Science News Cycle comic
- A recent study suggested that children who have older siblings who have autism are more likely to be diagnosed with autism. We wouldn't conclude causation....but yet
- a news story last year had the headline "High-stress jobs increase women's heart attack risk"....seems to be a causal conclusion from what can only be an observational study.
Class assignment: Pair up and find an example of a possible or tempting causation statement that does not have a logical basis or rely on adequate evidence.

Simple linear regression (10.2)

Before beginning, draw scatterplot on the board, for ongoing reference. Include a least squares line and a line for y-bar.

We've discussed how the response y, for each individual x value, can vary. We can allocate parts of this variation to different sources. What is the framework for understanding the sources of variation in regression?
- DATA = FIT + RESIDUAL
- Display image of y variation (normal curve) for different x values.
What other terms can we use to describe Fit?
- model, regression
What other terms can we use to describe residual?
- error--deviation from the line
- what "error" is included in this residual? (sampling only)
For a single x y pair...

--pick a single point on the scatterplot drawing and indicate each deviation as a vertical distance

--write each deviation formula under data = fit + model
1. What is the total variation in y, represented by the DATA portion of the framework?
  - $(y_i - \bar{y})$ , y observation minus the mean of y
2. What is the variation due to differences in x (FIT), the part that knowing the regression line will determine?
  - $(\hat{y}_i - \bar{y})$ , fitted value of y minus the mean of y
3. What is the variation due to the particulars of the individual observation (RESIDUAL)?
  - $(y_i - \hat{y}_i)$ , y observation minus the fitted value of y
How do we summarize this partitioning of variation across all observations of y?
- calculate the sum of squares
- add $\sum$ and ^2 elements to each element
What do we call each of these sum of squares elements?
- SST = SSM + SSE (total = model + error)
Sometimes we use the terms explained variation and unexplained variation. How are these terms aligned with our model?
- explained variation = FIT
- unexplained variation = RESIDUAL (error)
Remember that r² is the fraction of the variation in the values of y explained by the least squares regression of y on x. How can we use some of these new ideas about sums of squares to define r² mathematically?
- $r^2 = \frac {SSM}{SST} = \frac {\sum (\hat{y}_i - \bar{y})^2}{\sum (y_i - \bar{y})^2}$
How do you think analysis of variance (ANOVA) can be applied to our model?
- analyze these sources of variation, comparing fit with residual
Remember how $s_y^2 = \frac {\sum (y_i - \bar{y})^2}{n-1}$ , the denominator can be thought of as the degrees of freedom. What are these the degrees of freedom for?
- Degrees of freedom total (DFT), numerator is SST.
How can we relate the degrees of freedom for the model (DFM) and for the error (DFE) to the degrees of freedom for the total (DFT)?
- DFT = DFM + DFE
What does DFM equal? Why?
- 1, because one explanatory variable, x.
What does DFE equal?
- all the rest...DFT - DFM = n-1 - 1 = n-2
What is a mean square (MS)?
- $MS = \frac {sum\ of\ squares}{degrees\ of\ freedom}$
- average squared deviation
- s²_y is MST
What is MSM?
- mean square model
- $MSM = \frac {SSM}{DFM}$
What is MSE?
- mean square error
- $MSE = \frac {SSE}{DFE}$
Using the ANOVA F test, we can test for whether y is linearly related to x (Ho:β1 = 0 ). An F test is a ratio of variation due to the model over the variation due to error. How should we set up this ratio?
- F = MSM/MSE
- The result is a number that says that the variation explained by the model is "F" times bigger than the unexplained "error" variation.
When Ho is true, how is the F statistic distributed?
- an F distribution
- Like t, F is a family of distributions.
How do we specify the degrees of freedom?
- using the DF of the numerator and denominator: F(1, n-2)
- Display image of ANOVA F test from IPS6e
How does the F test compare with the t test for Ho:β₁ = 0?
- yield the same p-value: t²=F.
- t test allows a two-sided alternative (more powerful)
What happens to the MSM, MSE, and F when Ha:β₁ ≠ 0 is true.
- MSM is large relative to MSE, resulting in large F statistic.
We organize the elements that contribute to the ANOVA in an ANOVA table.
- Display ANOVA table
- Review each element
- Display ANOVA table for science score regressed on math score
- Review each element

Simple linear regression (10.1)

Before beginning, draw scatterplot on the board, for ongoing reference

We have two quantitative (interval) variables. How can we visualize the data for these two variables?
- scatterplot
What can you say about this scatterplot?
- display hsb math-science scatterplot (national sample of high school seniors in 1980)
- moderate positive relationship, outliers?
What else would we like to know about this relationship?
- correlation, least squares regression line
- Display scatter plot with fitted line, R²
- Discuss meaning of R², amount of variation in y explained by least squares regression of y on x, and how correlation, r, relates.
What would we like to know about the line?
- equation...
  
  $\hat{y} = b_0 + b_1 x$
- math-science example
  
  $\hat{y} = 16.76 + .67x$
Our scatterplot represents a sample. A different sample --> different plot. What are we estimating with this sample?
- $\mu_y\ =\ \beta_0\ +\ \beta_1 x$
This is simple linear regression. What do simple and linear refer to?
- Simple: only one explanatory variable (x)
- Linear: the underlying relationship between x and y is linear
In the population regression equation, what does μ_y signify?
- For each value of x, there is a distribution of y scores and μ_y is the mean of that distribution.
- We can think of each value of x as representing a subpopulation...all of the individuals who scored a particular value on the math test.
- We assume the means, μ_y, lie on a straight line when plotted against x.
- Display statistical model for linear regression image...along with
What assumptions are made about the observed values of the response variable (y), for a given value of the explanatory variable (x)?
- observed y values are Normally distributed with standard deviation, σ.
- these Normal distributions all have the same standard deviation...equal variance of y
So, the observed responses y vary about their means. How do we model estimation of the population regression line from sample data?
- Data = ( Fit) + (Residual)
- $y_i\ =\ \beta_0\ +\ \beta_1 x_i\ +\ \epsilon_i$
- ε_i are independent and Normally distributed N(0,σ).
- a response y is the sum of its mean and a chance deviation, ε.
What are the unknown parameters of the regression model?
- $\beta_0,\ \beta_1,\ \sigma$
Do we have a method for estimating β₀ and β₁, the "Fit" part of the model?
- Least squares regression
- $\hat{y} = b_0 + b_1 x$
What is y-hat in the population regression model?
- μ_y
Using our data, we calculate our estimates, b₀ and b₁.
- $b_1\ =\ r\frac{s_y}{s_x}$
- $b_0\ =\ \bar{y} - b_1 \bar{x}$
How do we estimate the residual, ε_i, in Data = Fit + Residual?
- The observed residual,
  
  $e_i\ =\ observed\ response\ -\ predicted\ response\ =\ y_i - \hat{y}_i$
- e_i sum to 0
The remaining unknown parameter in our model is σ, the variation of y about the population regression line. We will estimate σ by s, the regression standard error. What do we use to estimate s?
- residuals, e_i

$s = \sqrt{\frac{\sum residual^2}{n-2}} = \sqrt{\frac{\sum (y_i - \hat{y}_i)^2}{n-2}}$

What conditions are required to safely use regression to make inferences about the population?
- The observations are independent.
- The relationship between explanatory and response variables is linear.
- The standard deviation of y, σ, is the same for all values of x.
- The response y varies normally around its mean. (large n will compensate)
- Show graph of university GPA by HS GPA. What condition does this relationship fail to meet?
How can we decide if the conditions hold?
- Study residuals
- Display example residuals showing normal, nonlinear, σ varies with x
- Display graph of residuals for science score regressed on math score
- If residuals are scattered randomly around 0 with uniform variation, it indicates that the data fit a linear model, have normally distributed residuals for each value of x, and constant standard deviation σ.
- Display normal quantile plot
We have our regression equation. How can we be sure that the equation is better than using y-bar to estimate μ_y, for each given x value?
- Estimating β₀ and β₁ is a case of one-sample inference with unknown population standard deviation.
- We rely on the t distribution, with n – 2 degrees of freedom.
What is the standard form of the confidence interval?
- estimate +/- t*SE(estimate)
What is the standard form for calculating the t statistic?
- t = (estimate - hypothesis)/SE(estimate), distributed t(n-2)
Let's start with b₁. What are the hypotheses for testing significance of b₁?
- Ho:β₁ = 0
- Display picture of Ha
Conceptually, what are we looking for when we test Ho:β₁ = 0?
- evidence of a significant relationship between variables x and y in the population from which our data were drawn.
Remember the formula for b₁, what else are we testing when we test Ho:β₁ = 0?
- $b_1\ =\ r\frac{s_y}{s_x}$
- also tests the hypothesis of no correlation between x and y in the population
What about b₀? What does it mean, conceptually? Is a test of significance, Ho:β₀ = 0, meaningful?
- No.
Review SPSS output to find test values
- Display regression output for science score regressed on math score
- Identify slope/intercept values, t values, p values, and confidence interval
There are two more population parameters of interest, μ_y (mean of y value for a given x value) and y (an individual y value for a given x value). What do we use to estimate the value for each of these?
- $\hat{\mu}_y$ and $\hat{y}$ (which are equal)
- $\hat{\mu}_y = b_0 + b_1(given \ x \ value)$
- $\hat{y} = b_0 + b_1(given \ x \ value)$
How can we use our knowledge of a person's math score to predict his/her science score?
- display SPSS output showing b₀ and b₁ for science regressed on math.
- $\hat{\mu}_{sci} = \hat{sci} = 16.758 + .667(mat)$
What is the confidence interval for each (called a prediction interval for y-hat)?
- $\hat{\mu}_y \pm t^* SE_{\hat{\mu}}$ , where t* is critical value in t(n-2) distribution
- $\hat{y} \pm t^* SE_{\hat{y}}$ , where t* is critical value in t(n-2) distribution
How does estimation of a confidence interval for each of these differ? Why?
- confidence interval for μ-hat_y is smaller than y-hat
- we can be more confident when predicting a mean than when predicting an individual value.
- Display combined CI for μ-hat_y and y-hat for fludeaths regressed on flucases.
- The true value of the population mean μ_y at a given value of x, will be within our confidence interval in C% of all intervals calculated from many different random samples.
- The prediction interval contains C% of all the individual values taken by y at a particular value of x.
- The prediction interval represents mainly the error from the normal distribution of the residuals ε_i.
- In SPSS, you can create (save) the predicted values and CI values for each x in the dataset.
Doctor's office graph of children's height and weight. What is this
- 99% prediction intervals for the height (above) and weight (below) of US male children ages 3 to 18.

Analysis of two-way tables (9.1/9.2)

What kind of variables are assigned to explanatory and response in a comparison of two proportions?

both categorical (explanatory: two groups, response: yes/no)
But what if we have more than two groups. How can we compare three or more proportions? What if the response variable has more than two outcomes? (e.g., NJASK: partially proficient, proficient, advanced proficient) What exploratory data analysis method can we use to examine two categorical variables?

we can analyze the two-way table of counts (cross-tabulation, contingency table)
Let's start with two proportions: 4th grade and 8th grade proportion "passed": p-hat(4th) = .8 and p-hat(8th) = .667. How do we convert these to a two-way table?
- set up table with two columns -- 4th and 8th (explanatory)...add proportions under each heading
What are the categories of the response variable?
- label the "yes" row; add another row on the table for the "no" and fill in with proportions
Let's say we want to work with counts rather than proportions. How can we convert the proportions to counts?
- multiply by n (for the example n₁=100, n₂=120) to convert to a count.
- Convert the table rows to counts as follows

	4th	8th
Pass	80	80
Fail	20	40
	100	120

What are the marginal distributions on a two-way table?

add the row/col marginals
Could we add another group to this table, say 12th graders? Could we add another response category, say borderline pass?

sure, just another column and/or another row (but don't actually change the ex. table)
What comparison do we want to make?

go back to the two group proportions, the passing rate in 4th grade as compared to 8th grade
How can we understand this comparison in our two-way table?

conditional distributions
What conditional distributions are we interested in?
- when setting up the table, put the explanatory variable (if there is one) in the columns, then condition on the columns.
- P(pass|4th) = P(pass and 4th)/P(4th)
- P(pass|8th) = P(pass and 8th)/P(8th)
- add column percents to table on the board
In a crosstabulation, SPSS computes 3 kinds of cell-wise probabilities/percents. In the following picture, which is which?
- display screenprint of SPSS crosstabs
- identify how each percentage is created
What comparison are we interested in testing in this situation?
- difference in pass rate across groups (same as two proportions)
How do we generalize this to the two-way table situation where we may have more categories in explanatory/response variables?
- Ho: there is no association between the explanatory and response variables; the two variables are independent
- Ha: there is an association between the explanatory and response variables; the two variables are dependent
- The null hypothesis is saying that if there's nothing going on, we expect the distributions for each value of the explanatory (each population represented) to be the same
With what can we compare our observed counts to test whether they are different enough across columns?
- Expected counts tell us what the count would be if there's no association between explan and resp variables.
How do we calculate the expected count?
- $expected\ cell\ count = \frac{row\ total * column\ total}{n}$
What does the expected count mean?
- total row percent * column n = nrow * ncol/totn
- display example crosstabs and work it out
What are the expected counts for the status*grade example?
- Have the students work these out and write in the table on the board
- Note only one cell needs to be calculated from formula, the rest can be obtained by subtraction
- Note that expected counts don't need to be integers (whole numbers)
- Display expected counts in crosstabs
- Show how expected counts reflect the situation where P(pass) = P(pass|4th) = P(pass|8th)..calculate proportion using expected counts
What statistic do we use to test whether or not there is an association between the explanatory and response variables...whether they are independent or dependent?
- chi-square (Χ²)
What do we mean by independent?
- in our example, knowing what grade the student is in gives us no additional information about the passing rate beyond what we know about the overall passing rate
How do we compute the the X² (chi-square) statistic?
$X^2 = \sum \frac{(observed\ count - expected\ count)^2}{expected\ count}$
We are interested to know if the observed is quite different from the expected. How does the chi-square tell us that?
- as the difference between observed and expected increases, the chi-square value increases
How do we decide if the value is big enough?
- would like a p-value
If there is no association between the row and column variables, how will the chi-square statistic be distributed?
- according to a $\chi^2$ distribution
- the $\chi^2$ distribution is a family of distributions (like the t-distribution), depending on the degrees of freedom
- display image of a few $\chi^2$ distributions
Under what conditions is it safe to use the chi-square test for a two-way table?
- The samples are simple random samples (SRS).
- All individual expected counts are 1 or more (≥1)
- No more than 20% of expected counts are less than 5 (< 5)
  
  For a 2x2 table, this implies that all four expected counts should be 5 or more.
How does the chi-square test for two-way tables work?
- find the $\chi^2$ distribution with the correct degrees of freedom
- df = (r-1)(c-1)
- look for the $P(\chi^2 \ge X^2)$
- this is our p-value
- display the chi-square test for two-way tables image
Can a chi-square test be one-sided or two-sided?
- No. Only interested in upper tail, there is no "less than" as any deviation from null makes the statistic bigger.
What is the chi-square statistic for our status vs. grade example?
- Display SPSS results
- p-value = .027..we'd reject when alpha = .05
- note that p-value can be obtained from Table F...adequate, but chi-square calculators are easy to use
- and conclude that pass/fail status is related to grade
Is independence vs. dependence all we can conclude?
- No. We need to say something about the nature of the relationship. Provide some percents.
- The data show that 67% of 8th graders pass as compared to 80% of 4th graders revealing a significant relationship between grade and pass/fail status.
What can we use to help us interpret what's going on with a larger, more complex two-way table:
- it can be helpful to look at the contribution of each cell to the chi square statistic.
- ask which cells are contributing the most "difference" in example on p. 541 in text (cells with counts 1 and 19)
Can we conclude that grade causes the difference in pass rate?
- No. All we can say right now is that grade explains it. And since we can't randomly assign students to grades, we cannot do a more definitive experiment.
Would we have come to the same conclusion if we had calculated a z test comparing the pass proportions in the two groups?
- Yes.
- A chi-square statistic is equal to the square of the z statistic.
Review how to run crosstabulation and chi-square in SPSS.

Inference for proportions (8.1/8.2)

In what situations does it make sense to study the population proportion?

when we have a categorical response variable, such that we are counting membership (successes) in each category. Example: what proportion of students bring their lunch to school?

We draw a sample from the population. How is our data recorded?
- draw the population of subjects, and sample of X's that result from sampling.
- record the sample responses as 1 (success) or 0 (failure) for each individual in the sample. (Draw table with response for each student.) Add it up: $X = 1 + 0 + 0 + 0 + 1 + 1 + 0 +...+ 1 + 0$
What is the point estimator for population proportion, p?

the sample proportion of successes

$\hat{p} = \frac{X}{n}$

How is the sample proportion related to the sample mean x-bar?
- $\hat{p} = \frac{count\ of\ successes\ (1s)}{n} = \frac{\sum x_i}{n} = \bar{x}$
- so the sampling distribution for the sample proportion $\hat{p}$ is a special case of the sampling distribution of the mean, $\bar{x}$ .
If we sample from a large population (say 20 times larger than sample size), how will X (the number of successes) be distributed?
- B(n,p); according to a Binomial distribution with parameters n and p
- $P(X = x) = \frac{n!}{x!(n-x)!}\ p^x (1-p)^{n-x}$ , note formula more often seen using k successes, rather than x.
- $\hat{p}$ is related to X, the distribution of $\hat{p}$ is related to the binomial distribution.
But the binomial distribution is messy to work with, what can we use instead?
- normal approximation to the binomial when n is large
- when n is large both $X\ and\ \hat{p}$ are approximately Normal.
The binomial for X (the number of successes) has μ = np and σ = sqrt(np(1-p)). How does this translate to the mean and sd for p-hat>?

divide by N....

$\mu_\hat{p} = p$ and

$\sigma_\hat{p} = \frac{\sqrt{np(1-p)}}{n} = \sqrt{\frac{p(1-p)}{n}}$

But, the standard deviation uses p and we don't know p. What do we substitute?

$\hat{p}$ , and change the name to standard error...

$SE_\hat{p} = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$

confidence interval for single proportion

What is the general form of a confidence interval?

estimate +/- margin of error

What do we need to create the margin of error?
- multiplier? (z*...1.645, 1.960, and 2.576 at the 90%, 95% and 99% confidence levels)
- se of sampling distribution for p-hat
- $\hat{p} \pm z^* \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$
What are the conditions required for safely using this confidence interval?
- Need to ensure that sample size is large enough to assume that sampling distribution of $\hat{p}$ is Normal.
- number of successes ( $n\hat{p}$ ) and number of failures ( $n(1-\hat{p})$ ) are both 15 or greater.
- population is at least 20 times as large as sample.
What kind of error is included in this calculation?
- sampling error only, errors in data collection (non-response, lack of accuracy) are not included and can be much more serious than sampling error.
If we want to be more confident that the interval contains the population parameter, what do we have to 'give' on?

precision, narrowness of interval, so we increase our percent confidence

What else can we do to increase precision, for a fixed level of confidence?

increase sample size

How can we use the formula for margin of error to figure out how large n should be (for a given margin of error)?

solve for n, such that

$n = {\left ( \frac{z*}{m} \right )}^2 \hat{p}(1-\hat{p})$

What practical problem arises when calculating a desired sample size, given a confidence level and desired margin of error?

formula uses p-hat, but that's what we want to estimate with the sample...

How can we overcome this problem?
- use a value from a pilot study or use a conservative value for p-hat...one that will make the largest standard error...this is always p-hat=.5...have the students confirm that this is true
What is the formula for the conservative estimate of n given m (margin of error)?
- $n = \frac{1}{4} \left ( \frac{z^*}{m} \right )^2$
- note that when z* is 1.96 (95% confidence), the result is n = 1/m^2, which for 3% margin of error is about 1000.

significance test for single proportion

What is the null hypothesis for this test?
- Ho: p = p₀
- So now we have an estimate for p that we can use rather than $\hat{p}$ .
What test statistic can we use to compare p-hat with p₀?

$z = \frac{\hat{p} - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}}$

What are the possible Ha?

Ha: p < > ≠ p₀ (show image of P(Z>=z) for each case)

What are the conditions needed to safely use this test?
- expected number of successes, $np_0$ , and the expected number of failures, $n(1-p_0)$ are both at least 10.
- population is at least 20 times as large as sample

comparing two proportions

How do we think about two populations?

Fill out the table below

Population	Pop prop	Sample size	Count of successes	Sample prop
1	$p_1$	$n_1$	$X_1$	$\hat{p}_1=X_1/n_1$
2	$p_2$	$n_2$	$X_2$	$\hat{p}_2=X_2/n_2$

$D = \hat{p}_1 - \hat{p}_2$

when both samples are large, distribution of D is approximately Normal

$\mu_D = \mu_\hat{p_1} - \mu_\hat{p_2} = p_1 - p_2$

$\sigma_D = \sqrt{\frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2}}$

$SE_D = \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}}$

How do use all of this to do a confidence interval for a comparison of two proportions?

$D \pm m$ , where

$m = z* SE_D$

What are the conditions needed to safely use this confidence interval?
- number of successes, $n \hat{p}$ , and the number of failures, $n(1- \hat{p})$ , in both samples, are both at least 10, to assure that distribution of D is Normal
- population is at least 20 times as large as the samples
- samples are independent
What is the null hypothesis used to test the difference in proportions?

Ho:

$p_1 = p_2$

Looking at our SE(D), how can we revise it to reflect our null hypothesis that p₁ = p₂

Devise a pooled estimate of p, which we'll call

$\hat{p} = \frac{X_1 + X_2}{n_1 + n_2}$

So

$SE_{Dp} = \sqrt{\hat{p}(1-\hat{p}) ((1/n_1) + (1/n_2))}$

What are the explanatory and response variables in the test for the difference of two proportions?

both are categorical -- explanatory defines the two populations, response is a yes/no on a particular question

create a two way table as a prelude to

$X^2$

Examples^[1]

Insurance companies are interested in knowing the population percent of drivers who always buckle up before riding in a car.

a. When designing a study to determine this population proportion, what is the minimum number you would need to survey to be 95% confident that the population proportion is estimated to within 0.03?

$n=\frac{1}{4} \left (\frac{z*}{m} \right)^2 = \frac{1}{4} \left (\frac{1.96}{.03} \right)^2 = 1067.11$

Ans:1068

b. If it was later determined that it was important to be more than 95% confident and a new survey was commissioned, how would that affect the minimum number you would need to survey? Why?

Need an even larger sample size. z* increases, but everything else stays the same.

Suppose that the insurance companies did do a survey. They randomly surveyed 400 drivers and found that 320 claimed to always buckle up. We are interested in the population proportion of drivers who claim to always buckle up.
- What is the sample proportion
  
  $\hat{p}=.8$
- Is it safe to construct a confidence interval?
  
  yes,
  1. number of successes and failures are both > 15
  2. population is more than 20 times the sample size
  3. samples are independent
- Construct a 95% confidence interval for the population proportion that claim to always buckle up.
  
  $\hat{p} \pm z^* \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} = .8 \pm 1.96\sqrt{\frac{.8(.2)}{400}} = .8 \pm .04$
  
  (.76, .84)

Two types of medication for hives are being tested to determine if there is a difference in the percentage of adult patient reactions. Twenty out of a random sample of 200 adults given medication A still had hives 30 minutes after taking the medication. Twelve out of another random sample of 200 adults given medication B still had hives 30 minutes after taking the medication. Test at a 1% level of significance.
- What test will we use?
  
  Significance test for comparing two proportions
- What is the random variable?
  
  $\hat{p}_A - \hat{p}_B$
  
  difference in the percentages of adult patients, taking medication A as compared to medication B, who still had hives after 30 minutes.
- What are the hypotheses to be tested?
  
  Ho: $\hat{p}_A = \hat{p}_B$ or Ho: $\hat{p}_A - \hat{p}_B = 0$
  
  Ha: $\hat{p}_A \ne \hat{p}_B$ or Ho: $\hat{p}_A - \hat{p}_B \ne 0$
- What are $\hat{p}_A$ and $\hat{p}_B$ ?
  
  $\hat{p}_A = \frac{20}{200} = 0.1$
  
  $\hat{p}_B = \frac{12}{200} = 0.06$
- Are the conditions met such that we can safely use the test?
  
  yes,
  1. samples are independent
  2. population is large
  3. np_A=20, np_B=12, the failures for both are large
- What is $\hat{p}$ , the pooled estimate of p?
  
  $\hat{p} = \frac{X_A + X_B}{n_A + n_B} = = \frac{20 + 12}{200 + 200} = 0.08$
  
  $1 - \hat{p} = 0.92$
- What is the SE_Dp?
  
  $SE_{Dp} = \sqrt{0.08 \cdot 0.92 \cdot ((1/200) + (1/200))} = 0.0271$
- What is the z statistic?
  
  $z = \frac{\hat{p}_A - \hat{p}_B}{SE_{Dp}} = \frac{0.1 - 0.06}{0.0271} = 1.476$
- What is the p-value?
  
  display the normal calculator...p-value = .14
- What decision do we make and what is our conclusion?
  
  fail to reject Ho, not enough evidence to support that difference in med A and med B is NOT 0.
  
  draw normal curve and color-in two .07 proportions on each end of curve.

References

Jump up ↑ Dean, S., & Illowsky, B. (2009, February 18). Confidence Intervals: Homework and Comparing Two Independent Population Proportions. Retrieved from the Connexions web site on 5 Oct 2010.

Matched pairs (part of 7.1)

What are the two ways to create a matched pairs design?

observations are paired by subject--two measurements per subject, test-retest

observations are natural pairs--twins, spouses, siblings, matching on ability

What do we mean by a dependent groups design vs. an independent groups design?
- Note that it's not the research question that drives the decision as to which method, it's the study design.
- Pair up with the student next to you. Take a few minutes to come up with an example of a matched pairs design and a corresponding independent groups design. (Don't worry about whether it's actually doable.)
- Have each group share their design.
Why do we treat this design differently than an independent groups design?

the between subjects variation is controlled by using the differences within subjects. Each subject serves as their own control. Eliminates other confounding factors (ability, age, knowledge...) which occur btwn subjects.

Draw the two populations for independent groups leading to sampling distribution of mean differences,

$\bar{x}_1 - \bar{x}_2$ , compared to one population of mean differences (matched pairs) leading to sampling distribution of differences,

$\bar{x}_d$ .

How does a matched pairs sample become a special case of the one-sample t-test?

We can take the difference between the two measures for each individual; this difference is then compared with no difference.

We have one standard deviation, s, and one standard error,

$s / \sqrt{n}$ .

If we are in the one-sample situation, does the matched pairs design have an explanatory and response variable?
- The explanatory variable is the categorical variable that describes the two conditions/"populations".
- The response variable is the quantitative variable that is measured in each of the two conditions.
What do we test in the matched pairs t-test?
- Draw two populations (to represent the two conditions)
- Ho: μ₁ = μ₂ --> μ_d = 0
- Ha: μ₁ >, <, ≠ μ₂ --> μ_d >, <, ≠ 0
- Note: in learning about using a t-test with two groups we have specified a null value, but in fact this value doesn't have to be 0. It can be any expected value. 0 is the usual case.
What is μ_d?

the mean of the differences between paired observations in sample 1 and sample 2...x(1) - y(1), x(2) - y(2). (display oli picture showing each pair of observations converted to differences)

What conditions must be met in order to use the matched pairs t-test?
- sample of differences is randomly obtained
- sample size is large or population of differences varies normally
For small samples, how do we confirm that population distribution is normal?

check a histogram and/or Normal quantile plot (convert each difference to a percentile, determine z-score for that percentile, plot the difference score against the z-score, should result in a straight line, p. 68 in text)

What test statistic is used for the matched pairs t-test?

$t = \frac {\overline{y_d} - 0}{\frac{s_d}{\sqrt{n}}}$
- Note: this is the one sample t-test, df = n-1
What is the confidence interval for μ₁ - μ₂, μ_d?

$\bar{x_d} \pm t^* \frac{s_d}{\sqrt{n}}$
There are various names for a matched pairs t-test.
- Paired samples t-test or just paired t-test
- Correlated t-test or correlated pairs design
- Dependent sample t-test
Does it matter how the difference is set up?

Example: We want to determine if a relaxation exercise lowers anxiety level.To test the effectiveness of the relaxation exercise, 10 individuals were recruited and their pre-exercise and post-exercise anxiety levels were measured. The differences in scores were analyzed using a matched pairs t-test.
- We think the post anxiety will be lower than pre. How shall we set up the difference? (d = pre - post)
- Hypotheses: Ho: μ_d = 0; Ha: μ_d > 0
Two ways to run a matched pairs t-test
- Review instructions on transform data method, one-sample t-test vs paired samples t-test

Additional topics: type I and type II errors and power

What are the two types of errors associated with hypothesis testing?
- Type I and Type II
- Display two-way table (reality vs. decision)
- Reject Ho when it is true (false positive) --> Type I
- Retain Ho when it is false (false negative) --> Type II
What is the probability of a Type I error?
- α = .05 or .01 (whatever we set it at)
How does a Type I error relate to the sampling distribution?
- Normal population, gives rise to sampling distribution of the mean
- Mean of distribution is pop mean, for a two-sided test, determine mean value corresponding to p=.025 in each tail
- We will reject whenever the mean is in this range, even when Ho is true. (False positive)
Why don't we minimize α to be very small (minimize false positives)?
- Makes it harder to reject Ho.
- This is the other error (Type II, false negative)-- failure to reject (retention) of Ho even when it is false.
What do we call the ability to reject Ho when it is false.
- Power
- Probability of Type II error = 1 - power
How do we get more power?
- everything else being equal, larger sample size
- important to know ahead of time what sample size needed to achieve certain level of power
- If we fail to reject Ho, we want to do so because Ho is true, not for lack of power.
What happens when we use the z distribution when a t distribution is the correct distribution to use?
- display graph including overlay of z and t distributions.
- the α level may be larger than specified.
- unknowingly have a larger Type I error

7.2 Comparing two means

For this new test we are going to compare two means. How do we think about this situation with respect to populations?
- Draw two populations; the mean of a particular variable for each distinct population is represented as μ₁ and μ₂.
- We want to test whether the two population means are different.

When we draw the two samples, one from each of the two populations, what must we be careful to do?

The two samples must be independent

If we are going to compare two means, we need two variables. How do we describe/classify these two variables?

Explanatory variable which is categorical (a grouping variable)
Response variable which is quantitative (provides scores/data which are summarized as a mean)

How many values does the explanatory variable have?

2

What is the null hypothesis for this test? What does it mean in words?

Ho: μ₁ - μ₂ = 0

OR

Ho: μ₁ = μ₂

What are the possible alternative hypotheses? What does each mean?

Ha: μ₁ - μ₂ ≠ 0 ...(Ha: μ₁ ≠ μ₂)

Ha: μ₁ - μ₂ < 0 ...(Ha: μ₁ < μ₂)

Ha: μ₁ - μ₂ > 0 ...(Ha: μ₁ > μ₂)

(discuss which mean is greater for the one-sided alternatives)

What is the population parameter for which we are doing hypothesis testing?

the difference between the means, μ₁ - μ₂

this means that we have a sampling distribution of differences...if both population distributions are normal, then sampling distribution of differences is also normal

draw sampling distribution of differences

What is the null value?

0

What are the conditions (also called assumptions) necessary for use of the independent samples t statistic (t test)?
- each sample is SRS from population
- the two samples are independent...each value is sampled independently from each other value.
- the distribution of the response variable in both populations is normal
- some procedures require an "equal variances" assumption. We will use a more general procedure that doesn't make this this assumption.

What is the general structure of the t test?

(put up one-sample t-test formula, if needed:

$\frac{\bar{x} - \mu_0}{s/\sqrt{n}}$ )

$\frac{sample \ estimate - null \ value}{standard \ error}$

What is the formula for the two-sample t statistic?

$t = \frac{\bar{y}_1 - \bar{y}_2}{\sqrt{{s_1^2 \over n_1} + {s_2^2 \over n_2}}}$

review why this statistic makes sense....element by element,

y₁ and y₂ estimate μ₁ and μ₂, so $\bar{y}_1 - \bar{y}_2$ estimates μ₁ - μ₂
the null value is missing from the equation.
the denominator is the standard error of $\bar{y}_1 - \bar{y}_2$

What does the value of the t statistic tell us?

measures (in standard errors) the difference between what the data tell me about the parameter of interest μ₁ - μ₂ (sample estimate) and what the null hypothesis claims that it is (null value).

What distribution is used to calculate the p-value?

The null distribution approximates the t distribution with the appropriate degrees of freedom. It's not exact, but good enough for our purposes. Let statistical software calculate the df.

How do we use the p-value in hypothesis testing?

the p-value indicates amount of evidence against Ho; p-values less than the alpha threshold provide strong evidence against Ho and in favor of the specified alternative.

Why should Ha be set before doing the study and looking at the data?

easier to reject Ho with one-sided alternative, but it is be wrong to set it after seeing the data leans in that direction. Contributes to error. Which error--any thoughts?

How often do we falsely accept Ha, when in fact Ho is true?

This is the alpha level...Type I error.

What if obtained 20 different samples and did 20 t-tests using .05 alpha, when in fact Ho is true. For how many might we reject Ho, according to probability? (1)

What does a 95% confidence interval for $\bar{y}_1 - \bar{y}_2$ tell us?

we are 95% confident that the actual value of μ₁ - μ₂ occurs in this range.

when Ho is rejected, the confidence interval quantifies the supposed effect of the explanatory variable on the response variable.

How is the confidence interval calculated?

$\bar{y}_1 - \bar{y}_2 \pm t^* \sqrt{{s_1^2 \over n_1} + {s_2^2 \over n_2}}$

Some further considerations

The method described here does NOT assume variances are equal. The pooled t test is used in this case; described later in chapter, we won't be using this method.
The general method described here is robust to violations of Normality when
- sample sizes are large (n₁+n₂ > 40)
- sample size in each group is equal and shape of population distributions for each group are similar
the routine will ask us to label one sample as "group 1" and the other as "group 2", how do we decide? Doesn't matter as long as Ha for one-sided test matches.
small samples may be useful when effect size is large. If borderline, study can't say much, not enough power.

7.1 Inference for the population mean

What is the name of the theorem that says that when n is large the sampling distribution is N( $\mu, \ \sigma/\sqrt{n}$ ), regardless of the shape of the starting distribution? (central limit theorem)

Click on the link for Sampling Distribution applet. Create a crazy population distribution -- highly skewed with significant outliers. Set samples to N=5 and N=25, run simulation.
Discuss idea that when we look up the p-value for a z test we are assuming that the distribution of means is shaped like the z distribution.
Draw a normal distribution and shade a possible p-value. Compare this area to the area for the distribution created for N=5.

What is the z test for the population mean?

$z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}}$

note there are two population values μ₀ and σ.
the distribution of this statistic is normal and is derived from the sampling distribution of $\bar{X}$

What do we call the standard deviation of a statistic (e.g., a mean) when it is estimated from the data?

standard error
draw a normal distribution of $\bar{x}$ 's; the standard error is the standard deviation of the distribution of sample means

How is the standard error of a statistic different from the standard deviation of a statistic (e.g., SE_X-bar vs. SD_X-bar)?

standard deviation of a statistic uses the population value...

$SD_\bar{x}= \frac{\sigma}{\sqrt{n}}$

standard error of a statistic uses the value calculated from the sample...

$SE_\bar{x}= \frac{s}{\sqrt{n}}$

When σ is unknown and we are forced to use $SE_\bar{X}$ , can we go ahead to use the z test anyway, replacing σ with s?

NO!! When s replaces σ we now have a t statistic

What is the one sample t statistic?

$t= \frac{\bar{x} - \mu_0}{s / \sqrt{n}}$

the t statistic has a t distribution with n-1 degrees of freedom
degrees of freedom can be a difficult concept and difficult to determine; basically it's the number of independent pieces of information that go into the estimate of a parameter (in this case the t statistic)

How do we denote a particular t distribution?

t(k), where k = degrees of freedom

How is a z distribution similar to a t distribution?

symmetric, centered at 0, covers - $\infty$ to $\infty$
show figure comparing t(2), t(5), and z (note that t(30) ~ z)
show figure comparing a z score and t score -- review differences that result in larger spread

How do we use the t statistic in hypothesis testing?

the t statistic is the standardized score for $\bar{x}$ assuming Ho is true, μ = μ₀.
the t statistic follows the t distribution, so we can calculate the t statistic and then use the distribution to determine the p-value (the likelihood of obtaining that value, or a larger one, of t)

What are the conditions (also called assumptions) necessary for use of the t test?

the sample is random
population distribution is normal, well this is hard to know for sure.

show table of sample size vs. normality of population distribution

How do we decide if the population is Normal?

Look at the data for evidence.

Given SRS of size n, drawn from a population having unknown mean μ and given Ho: μ = μ0, how do we use the t statistic to test Ho?

explain that the population with the unknown mean is the one from which the sample is actually drawn, NOT the one that is the usual case, the one with μ0. We are testing to see if the population from which the sample is drawn is different from the usual (null) population.
review p-value probability formulas and pictures of t-distributions with p-values shaded for each version of Ha. example graphics

What is the confidence interval for an estimate of a population mean, $\bar{x}$ , when σ is unknown?

$\bar{x} \pm t^* \frac{s}{\sqrt{n}}$

What part is called the margin of error?

$t^* \frac{s}{\sqrt{n}}$

What is t* for a 95% confidence interval given 15 df?

Table D in textbook lists these values for a selection of t distributions
Review how to use Table D to obtain t*
Note that as df gets larger (n gets larger), the t values approach z.

What violations to the conditions for use of the t test are of concern?

The t test is fairly robust, mall deviations from normality – the results will not be affected too much. Factors that strongly matter:
1. Random sampling: the data must be a random sample from the population
2. outliers and skewness: strongly influence the mean and therefore the t procedures. However, their impact diminishes as the sample size gets larger because of the Central Limit Theorem.

Sample size rules of thumb:
1. for n> 40, the t-statistic will be valid, even with strong skewness (but you should still look at the data using exploratory data analysis tools)
2. for 15 < n < 40, mild skewness is acceptable but not outliers
3. for n < 15, only use t test if sample distribution is close to Normal and without outliers

[1] Jump up ↑ Dean, S., & Illowsky, B. (2009, February 18). Confidence Intervals: Homework and Comparing Two Independent Population Proportions. Retrieved from the Connexions web site on 5 Oct 2010.

[1]

[▼]Project to Create Statistics Content Modules
Project	StatisticsContent
Design	Objectives \| Learning Design
Resources	Reading in statistics \| By type \| By topic
Content	Learn by doing
Intro Stats course	Syllabus \| Online course schedule \| Schedule for in-person meetings \| Plans for in-person meetings
GSE Stat Methods II	Review notes \| Topic resources

GSE Stat Methods II - Review Notes

Contents

Two-Way Analysis of Variance (13.1/13/2)

Comparing means (12.2)

Planned comparisons (contrasts)

Post-hoc analyses & multiple comparisons

Inference for one-way ANOVA (12.1)

Multiple regression (11.1/11.2)

Subtopic: Causation

Simple linear regression (10.2)

Simple linear regression (10.1)

Analysis of two-way tables (9.1/9.2)

Inference for proportions (8.1/8.2)

confidence interval for single proportion

significance test for single proportion

comparing two proportions

Examples^[1]

References

Matched pairs (part of 7.1)

Additional topics: type I and type II errors and power

7.2 Comparing two means

7.1 Inference for the population mean

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Community

Print/export

Tools

GSE Stat Methods II - Review Notes

Contents

Two-Way Analysis of Variance (13.1/13/2)

Comparing means (12.2)

Planned comparisons (contrasts)

Post-hoc analyses & multiple comparisons

Inference for one-way ANOVA (12.1)

Multiple regression (11.1/11.2)

Subtopic: Causation

Simple linear regression (10.2)

Simple linear regression (10.1)

Analysis of two-way tables (9.1/9.2)

Inference for proportions (8.1/8.2)

confidence interval for single proportion

significance test for single proportion

comparing two proportions

Examples[1]

References

Matched pairs (part of 7.1)

Additional topics: type I and type II errors and power

7.2 Comparing two means

7.1 Inference for the population mean

Navigation menu

Search

Examples^[1]