GSE Stat Methods II  Review Notes
From WikiEducator
< User:ASnieckus  Statistics(Redirected from User:ASnieckus/StatisticsContent/GSE Stat Methods II review notes)
Project to Create Statistics Content Modules  

Project  StatisticsContent 
Design  Objectives  Learning Design 
Resources  Reading in statistics  By type  By topic 
Content  Learn by doing 
Intro Stats course  Syllabus  Online course schedule  Schedule for inperson meetings  Plans for inperson meetings 
GSE Stat Methods II  Review notes  Topic resources 
The following review is based on the indicated chapter and section of
 Moore, D. S., McCabe, G. P., & Craig, B. A. (2009). Introduction to the practice of statistics (6th ed). New York: W. H. Freeman.
The questions and items for display are organized as a slide show. The subbullets for each point support discussion and content to be written out on the board.
TwoWay Analysis of Variance (13.1/13/2)
 What is the explanatory variable in a twoway ANOVA?
 two categorical variables
 The categorical variables are called factors. (Factor A and Factor B)
 What populations are we making inference about?
 The populations created by crossing Factor A and Factor B classifications
 What is the response variable in a twoway ANOVA?
 Response: one quantitative variable
 Examples of twoway ANOVAs
 students devise an example with a partner, each pair presents their example
 draw at least one of the examples as a twoway table to create cells
 demonstrate how the number of levels in each factor is used to calculate # of cells.
 What are the advantages of a twoway ANOVA?
 (use one of the examples to show the following)
 studying two factors simultaneously is more efficient.
 (assign n's to marginals in example to show how one set of data can provide information for both factors)
 Including a second factor may reduce the residual variation.
 DATA = FIT + RESIDUAL
 DATA = sum of differences in each score from overall mean
 FIT = sum of differences in each cell mean from overall mean
 RESIDUAL = sum of differences in each score from cell mean.
 including the second factor may result in a better fit of the data in a cell with the cell mean, resulting in less residual variationsmaller MSE.
 Interaction between the factors can be investigated.
 Introduce idea of main effect: effect on response variable of differing levels of one factor pooled across the levels of the other factor; comparable to oneway ANOVA.
 a main effect for each factor
 Interaction: results are not predicted by knowing main effects. These cannot be studied using two oneway ANOVAs.
 Introduce idea of main effect: effect on response variable of differing levels of one factor pooled across the levels of the other factor; comparable to oneway ANOVA.
 Interaction example:
 Two FactorsExercise (regular, program) and diet (regular, special)
 ResponseCholesterol level
 What main effects might we predict?
 exercise program lowers cholesterol
 special diet lowers cholesterol
 What interaction might result?
 Cholesterol level receives main effect for exercise, plus main effect for diet, plus interaction effect (an interaction implies the effect of one variable differs depending on the level of another variable)
 It could be that doing both the special diet and exercise program lowers the cholesterol level even more than would be expected given the two main effects.
 If we have a twoway table of means, where are the main effects?
 marginals....marginal means
 How are the number of levels in each factor used to describe the model?
 Factor A: 3 levels, Factor B: 4 levels > 3x4 ANOVA that includes 12 cells
 What are the possible outcomes for a two way ANOVA?
 Display a table similar to the following, along with plots to represent each column:
Sign of Main Effects  Neither  Neither  1 factor  Both factors  Both factors 

Sign of Interaction  None  Significant  None  Signficant  None 
 What is the best course of action if an interaction is significant?
 Study the plot of means for each level of each factor.
 The main effects may or may not be informative.
 What are the hypotheses that we use with a twoway ANOVA?
 Null:
 There is no main effect due to Factor A
 There is no main effect due to Factor B
 There is no interaction
 Alternative:
 The null hypotheses negated
 Null:
 What are the conditions for safe use of twoway ANOVA?
 same as for oneway ANOVA, given additional factor
 The samples drawn from each of the Factor A x Factor B populations are independent.
 Each of the populations...
 must be normally distributed (in addition to separate histograms or normal quantile plots for each, we can also look to see if residuals are normally distributed.)
 have the same standard deviation
 Must sample sizes be the same for all of the cells?
 No
 Balanced design (equal sample sizes) has some advantages, but it's not necessary.
 How do we partition the sums of squares and degrees of freedom in a oneway ANOVA?
 SST = SSG + SSE
 DFT = DFT + DFE
 How do we partition the sums of squares and degrees of freedom in a twoway ANOVA?
 SST = (SSA + SSB + SSAB) + SSE
 DFT = (DFA + DFB + DFAB) + DFE
 note that when the n's for each cell are not all the same, some methods will give sums of squares that do not add up
 ANOVA table
 Display the ANOVA table in IPS6e, p. 695
 Review each element
 Example: Reconstructing Chess Boards (source: Is competence and position of board related to short term memory for reconstructing chess boards)
 Explanatory Variables:(A) Chess ability: novice, average, good; (B) Pattern on chess board: random, real
 Response Variable: Reconstruction score
 What are the research questions?
 Chess ability: are better players able to better reconstruct boards?
 Chess board layout: are real chess boards easier to reconstruct than random boards?
 Ability x board layout interaction: does the effect of ability differ depending on the chess board layout?
 How do we set up the dataset in SPSS?
 Display twoway matrix with response values for each cell.
 Display SPSS screenshots showing dataset, and category values for ability and layout.
 How do we run a twoway ANOVA in SPSS?
 Select Analyze > General Linear Model > Univariate
 Display screenshot of Univariate dialog box to show allocation of variables
 Display dialog boxes for Model, Plots, Post Hoc and Options and discuss choices
 In SPSS work through example with Chess.sav.
 hypotheses? (three pairs)
 Look at the data...for each cell
 descriptive stats
 histograms
 sidebyside boxplots
 Describe the data.
 Check that conditions are met.
 Run the ANOVA in GLM...create a plot of means, save residuals. Interpret results.
 Create QQ plot of residuals, as a final check.
Comparing means (12.2)
 If the oneway ANOVA F test is significant, what can we conclude when we reject the null hypothesis?
 That not all of the μ's are equal.
 What else would we like to know?
 Which means are different from which other ones.
 Of course we can look at the sidebyside boxplots to gain some insight, but it won't tell us which differences are significant.
 How could we have attended to this issue when we designed the study?
 Included designs for planned (a priori) comparisons.
 IPS6e uses the term "Contrast" to refer to these planned comparisons.
Planned comparisons (contrasts)
 Are planned comparisons dependent on the results of the ANOVA F test?
 No, in fact planned comparisons can be run with or without a preceding F test, and whether or not an F test is significant.
 Example: High School and Beyond study. Previously we've looked at how well the math score predicted the science score. Now let's consider how students in three different programs differ in their science scores: general, academic preparatory, vocational/technical. Before we look at the data, what comparison(s) might be interesting to design ahead of time.
 Is mean science score for general different from vo/tech?
 Is mean science score for academic prep greater than average of general and vo/tech?
 Notice that each comparison is a contrast of two things....we will use the idea of linear combinations to create each contrast. What does the first contrast look like? (Let's write it in terms of the population hypotheses.)
 Ho: μ_{G} = μ_{VT}; Ha: μ_{G} ≠ μ_{VT}
 alternatively we can say, Ho: μ_{G}  μ_{VT} = 0; Ha: μ_{G}  μ_{VT} ≠ 0
 Let's assign coefficients (denoted a_{group}) to each of the means. What coefficients are implicit in our Ho and Ha statements?
 Ho: (1)μ_{G} + (1)μ_{VT} = 0; Ha: (1)μ_{G} + (1)μ_{VT} ≠ 0
 What linear combination can we create?
 c_{1} = (1)xbar_{G} + (1)xbar_{VT} + (0)xbar_{AP}
 How can we write the null and alternative hypotheses for the second contrast?
 Ho: μ_{AP}  1/2[μ_{VT} + μ_{G}] = 0; Ha: μ_{AP}  1/2[μ_{VT} + μ_{G}] > 0
 What linear combination can we create?
 c_{2} = (1)xbar_{AP} + (.5)xbar_{VT} + (.5)xbar_{G}
 How do we test a contrast? (Note that the linear combination boils down to a difference...between two groups.)
 Using a t test.
 What is the general form of the t test?
 t = (estimate  null value)/SE(estimate)
 In this situation, t = (contrast  0)/SE_{c}
 What is SE_{c}?
 a measure of the variability due to sampling of c
 We won't concern ourselves with understanding SE_{c}, except to say it is based on MSE (from the ANOVA), the n in each group and the assigned coefficients.
 What are the degrees of freedom for the t test?
 DFE (N  k)
 Specifying a contrast in SPSS....
 Display SPSS contrast dialog box showing coefficients for c_{2}
 Quick look at the data...
 Display sidebyside boxplots for example
 Interpret results of contrast...
 Display SPSS output c_{1} and c_{2}
 Interpret results of ANOVA...
 Display SPSS output for ANOVA
 Can we say anything about causation? (No)
 What would we have had to do to suggest causation? (randomly assign to groups)
 A few more points about contrasts
 collection of coefficients (a's) should sum to 0
 more powerful* than multiple comparisons...will understand that better soon
 can be one or two sided
 can create a confidence interval for difference value (c+/t*SE_{c})
 not all software packages include functionality to do a contrast
Posthoc analyses & multiple comparisons
 What if you didn't have any idea about comparisons before looking at the data, but now that you have a significant F test, you'd like to better understand the differences in the means. What kind of analyses can you run?
 unplanned comparisons...also called posthoc and a posteriori analyses
 Often this process involves many pairwise analyses. What's wrong with running multiple t tests on these as we did with planned contrasts?
 the Type I error rate, experimentwiseacross all of the analyses, will be larger than α
 What is a Type I error? How often do we make Type I errors?
 rejecting the null hypothesis when in fact it's true.
 Draw normal distribution and shade an area in each tail which together represent the amount α.
 What are the two kinds of Type I error rates that we need to be concerned with when making comparisons? Can both of these be set to alpha?
 per comparison Type I error rate
 experimentwise Type I error rate
 with 2 or more comparisons, there is no way to keep both per comparison and experimentwise Type I error rates equal to α
 If the per comparison Type I error rate = α, why is it that the experimentwise Type I error rate becomes larger than α?
 It goes back to probability...what is the probability that at least one of the comparisons results in a Type I error?
 If there are two comparisons, how many ways are there to have an error?
 c_{1}, c_{2} or both, so probability of at least one is higher.
 Draw two way probability chart
(c1) .95 (c1) .05 (c2) .95 .9025 .0475 (c2) .05 .0475 .025
 Display table of experimentwise probabilities
 There are lots of ways to control the experimentwise Type I error rate. We will discuss only one method...Bonferroni.
 Let's say you had 3 groups, so there are 3 pairwise comparisons that we could make. What's one way we could control the experimentwise Type I error rate?
 use a much smaller alpha for the test of each comparison
 in fact we could use α divided by the number of comparisons
 in our example with 3 comparisons, the α_{E} = .0167
 If we wanted to keep α=.05, how could we adjust the pvalue to account for the multiple comparisons?
 multiply the pvalue by the number of comparisons
 What are the statistic that we will calculate for each pairwise comparison?
 t test
 What is the null and alternative hyptheses?
 Ho: μ_{1}  μ_{2} = 0; Ha: μ_{1}  μ_{2} ≠ 0
 Can we use either a one or twosided Ha?
 NO, because these are unplanned, it's not reasonable to set a direction based on the results
 How do we run this in SPSS?
 Display statistics dialog box with Bonferroni selected
 How do we interpret the output?
 Display SPSS output for program type and science score
 Interpret all of the comparisons, noting the duplication in the table
 Would it be reasonable to plan for analysis of all pairwise comparisons so you don't have to run the more conservative Bonferroni comparisons?
 No. The more comparisons you test, the more likely you will be to falsely reject Ho, even if they are planned.
 This is a judgment call.
 See multiple comparisons section in onlinestatbook.
Inference for oneway ANOVA (12.1)
 Review. What explanatory and response variables are used in a comparison of means in two groups (populations)?
 explanatory variable is categorical with two values, corresponding to two population groups
 response variable is quantitative, from which means for each group are calculated
 note that when we talk about groups, we are referring to populations
 For what designs do we use this framework?
 independent samples
 matched pairs (paired samples, dependent samples), to some extent
 Now let's look at the situation in which there are more than two groups in the explanatory variable. How is this similar to the two group situation?
 explanatory variable is categorical, but with more than two values, with each value corresponding to a population group
 response variable is quantitative, from which means for each group are calculated
 Display overview of design.
 Note that we will only consider the case of independent samples. What do we call the extension of matched pairs to more than one group?
 repeated measures
 What test statistic did we use to summarize the difference between means for two independent groups?
 t test
 The structure of the t test only applies to two groups. What framework can we use to study more than two groups?
 analysis of variation...DATA = FIT + RESIDUAL
 note that this discussion does NOT use the notation in IPS6e
 How can we partition this variation?
 DATA = all of the values of the response variable (x_{i}'s) & how they vary in comparison to the overall mean.
 FIT = the k means, & how they vary in comparison to the overall mean.
 RESIDUAL = the variation around the group means...the difference between each observation and its group mean
 What test can we use to see how FIT compares with RESIDUAL...to see if the variation in group means is on average larger than the variation due to RESIDUAL?
 ANOVA Ftest
 Introduce Example: academic frustration and college major
 Why do we call this method a *oneway* ANOVA?
 because there is only one way to classify the observations into groups...in our example the students are classified into groups according to "major".
 What happens if we classify the students based on major and gender....to create 8 groups? We now have two ways to classify the observations.
 How many ways do we classify observations in a crosstabulation? Two ways.
 Before we get into the discussion of ANOVA, we MUST examine the data.
 Display histograms and descriptive statistics of frustration score for each college major
 What are the null and alternative hypotheses for the ANOVA F test?
 Ho: Ho: μ_{1} = μ_{2} = ... = μ_{k}
 Ha: not all of the μ's are equal
 What is one way the μ's could be unequal?
 any one or more could be different from others
 Even though we have an inkling of how the ANOVA works, let's start at the beginning. How can we visually understand the difference in means?
 plot of comparison of means (line graph); display plot of frustration scores
 sidebyside boxplots for each major
 What do the boxplots help us visualize?
 withingroup variation
 Which of the following sets of boxplots provides more convincing evidence that the population means differ?
 Display example from IPS6e, p. 640
 Note that in (a), the withingroup variation overlaps one with the next; it could be that these three boxplots represent sample variation from one common population.
 How could boxplots be misleading?
 display median and quartiles, rather than mean and sd, but as we'll discuss in the assumptions/conditions we expect the data in each group to be Normal, so these two measures of center will be reasonably close.
 Let's regroup. What is the question we are trying to answer?
 We want to know whether the differences among the sample means is due to true differences in the population means (Ha) or merely due to sampling variability (Ho).
 What can we use to evaluate the differences among the sample means?
 FIT compared to RESIDUAL
 What is FIT?
 variation due to the k sample means...variation among the sample means
 we called this variation due to the model in regression, we adjust this to variation due to groups for oneway ANOVA
 What is RESIDUAL?
 variation due to the individual observations as compared to their group mean...variation within groups.
 What do we use to summarize the comparison of FIT and RESIDUAL?
 What do we know about the F statistic?
 a family of distributions
 has two degrees of freedom values for each distribution
 distributed as an F(DFG,DFE) distribution when the null hypothesis is true.
 What are the conditions under which we can safely use the F statistic?
 The samples drawn from each of the k populations are independent; an SRS from each group ensures this.
 Each of the k populations...
 must be normally distributed.
 have the same standard deviation.
 How do we assess that the response variable varies normally in each of the k populations?
 study histograms of the samples for evidence of skewness and outliers.
 Large sample sizes will mitigate need to have normal distributions as a result of the central limit theorem.
 How do we assess that the k populations all have the same standard deviation?
 best we can do is evaluate if sample standard deviations are similar.
 A common rule of thumb is that Ftest is approx. correct when the ratio between the largest sample standard deviation and the smallest is less than 2.
 Evaluation of conditions for frustration example.
 the 4 samples were chosen randomly, so observations are independent
 samples size for each group is 35, so don't need to worry about normalitybut we did see that they were approx normal.
 Display descriptives. Show that largest sd/smallest sd = 3.1/2.1 < 2
 What will SPSS produce when we run the ANOVA?
 ANOVA table
 Display table and review each element, note calculation of DF
 How does our framework DATA = FIT + RESIDUAL work with the sources of variation?
 Total = Between groups + within groups (error)
 How are the sums of squares related?
 SST = SSG + SSE
 How are the degrees of freedom related?
 DFT = DFG + DFE
 What is the coefficient of determination?
 Just another name for R^{2}, which has the same interpretation
 Display ANOVA table for frustration example
 Explain each part
 What can we conclude based on the ANOVA F test?
 The F statistic is highly significant, so not all of the population means (μ's) are equal.
 How do we determine what is going on?
 Display boxplot of frustration scores by major
 Note that business is clearly different, but others may be different, we can't know.
 Why would it be wrong to do all possible pairwise ttests?
 because 5% of the time we'd get a significant result when in fact null is true.
 the next section of chapter 12 will present methods for comparing the means.
 How to run a oneway ANOVA in SPSS.
 select Analyze > Compare Means > OneWay ANOVA.
 Display screenshot of dialog box and walk through how to allocate variables.
 Display Options dialog.
Multiple regression (11.1/11.2)
 How is multiple regression different from simple linear regression?
 more than one explanatory variable
 many situations in which we can use knowledge of more than one explanatory variable to obtain a better understanding and better prediction of a particular response (e.g., low birth weight babies)
 exp var's generally quantitative, but can be categorical, e.g. dichotomous
 we will use i to denote data observations, from 1 to n, and j to denote number of explanatory variables, from 1 to p...DRAW data matrix
 How do we integrate the additional explanatory variables to create a model for an individual response in the population? i.e., a statistical model for multiple linear regression
 DATA = FIT + RESIDUAL
 we use a set of explanatory variables to predict response y.
 How do we predict the mean response, μ_{y}, given a set of explanatory variables?
 notice that the error term dropped out. Where did it go? ...we don't need anything more than the linear equation to predict the mean
 What assumptions must we make about the error term, ε_{i}?
 ε_{i} are independent
 ε_{i} are distributed N(0,σ)note this is a common σ not dependent on value of x
 same as for simple linear regression (also relationship is linear)
 In practice, we have to estimate the population parameters. What do we use to estimate the regression coefficients β?
 least squares estimation
 determine the set of estimates that minimizes the sum of squared difference between the observed and predicted scores, .
 no other set of regression coefficients will give a smaller SSE.
 What is the regression prediction equation that results?
 What is the residual that results?
 it is the sum of these e_i's which is minimized in the least squares method.
 In multiple regression, what is b_{0}?
 estimate of β_{0}
 the response score we would expect when the values of the explanatory variables are all zero (i.e., x_{1} = x_{2} = ... = x_{p} = 0).
 it is still the yintercept...in a multidimensional plot including all of the expl. var's.
 What does b_{j} mean? But first, what does j represent?
 estimate of β_{j}
 the increase in the response variable for every unit increase in predictor x_{j} given other variables remain constant.
 Describe example...CSDATA
 But before we go anywhere with this example, we MUST take a look at the data. What should we look at?
 Display descriptive statistics...
 GPA min=.12 looks suspicious
 SATV min=285 doesn't jive with SAT reported scores rounded to 10s.
 Display descriptive statistics...
 Extreme values of any variable should be noted and checked for accuracy.
 Display graph for any variable with suspicious values
 Note that distributions do not need to be normal...skewness is OK.
 Review the relationships between pairs of variables using correlations and scatterplots
 Display correlation matrix...note correlations among explanatory variables
 Display scatterplot of HSM vs. GPA.
 It is useful to study scatterplots of all pairs of variables to be included in regression model...it may be that two explanatory variables are related, such that only one is needed in the model.
 Start with a subset of variables (high school grades predict GPA)....display regression equation
 Those who scored zero in HSM, HSS, and HSE are expected to have a GPA of .590. (Not very meaningful)
 Keeping the HSS and HSE scores constant, a onepoint increase in HSM corresponds to a 0.169 increase in GPA.
 Similarly keeping HSM and HSE constant, a onepoint increase in HSS corresponds to a 0.034 increase in GPA.
 Similarly for HSE...
 Be careful because interpretation of individual contribution of variables is very complicated. There is a whole course on it.
 How do we know if the regression coefficients are helping to predict the response variable at all?
 We use the F test, and the ANOVA table
 What hypotheses are tested using the F test?
 Ho: β_{1} = β_{2} = ... = β_{p} = 0 (all are not predictive)
 Ha: at least one of the β_{j} is not 0
 How do we calculate the F value in multiple regression?
 Same as in simple linear regression.
 F=MSM/MSE
 What are the degrees of freedom for F(DFM, DFE)?
 DFM = p (number of explanatory variables)
 DFE = np1 (compare to n2 for one explan variable)
 Display CSDATA ANOVA, note that F is significant
 We reject Ho and conclude that at least one of the b values is not 0, not clear if all three are useful or not.
 A thorough answer to which variables are important requires more advanced study. What can we look at to begin to understand the relationship of the explanatory variables with the response variable?
 Table of coefficients in SPSS output.
 Ask what each of the b parameters means...
 Based on the significant F test, we know that taken together HSM, HSS, and HSE are predictive of college GPA. How can we evaluate if all three are necessary?
 First, what do we mean by "necessary"?.....Necessary can be rephrased for each predictor “does this predictor provide additional information, given that other predictors are already in the model?”
 study the pvalues of the ttests in the coefficient table. (Why do we use a t test?)
 we notice that neither HSS or HSE are significant.
 Display correlation matrix.
 but how could that be, both variables correlate with college GPA? The t test indicates whether the coefficient provides statistically significant explanation of the response variable, in addition to the explanation provided by the other variables in the model.
 Let's drop one of the variables. Which do you choose?
 HSS, because it has the largest pvalue.
 Display new regression results.
 Note F significant
 Note comparison of regression coefficients for HSM (was .169) and HSE (was .045)
 HSE is again not significant. We could drop it....
 The answer given by the coefficients table is only partial
 Optimal method of determining the best model (i.e., fewest number of predictors with relatively the same predictive power) involves more complex algorithm
 May involve factors other than statistical contribution (e.g., cost of obtaining certain variables)
 What else should we be evaluating as we refine our model?
 Display plot of residuals...note that yhat is plotted on the x axis as a way to represent both explanatory variables
 errors are evenly scattered around the 0 line.
 we would also study residual plots for each of the explanatory variables
 note that maximum predicted value is just over 3, there were lots of 4.0's in the data...model less than perfect.
 We know our model is significant (in statistical terms), but we also want to ensure that it is useful. What can we use to study how effective our model is?
 R^{2}
 In simple linear regression we could use both r and R^{2}, in multiple regression we have many correlations (expl with response and among explan)....no longer helpful to indicate overall effectiveness.
 R^{2} is SSM/SST, proportion of variance in the response variable accounted for by the predictor variable(s)
 What is the square root of R^{2}?
 correlation of y_{i} and yhat
 How does R^{2} compare for regression including HSM, HSS, and HSE compared to HSM and HSE?
 Display comparison of SPSS model tables (including only HSM)
 If we add a variable that is correlated with the response, we can expect an increase in R^{2}. But is the increase useful or negligible?
 How well do the SAT variables predict college GPA?
 Significant, but not very useful...R^{2}=.063
 Also note that SATV is not significant
 What does the model look like if we enter all of the variables?
 Display output for all variables entered.
 Only HSM is significant
 Note that this is for Computer Science students....
Subtopic: Causation
 Successful prediction does NOT require cause and effect
 Display xkcd.com correlation cartoon
 When are we in danger of erroneously concluding causation?
 Whenever we are doing an observational study.
 Only way to establish direct causal link between two variables is to conduct a carefully designed experiment in which effects of possible lurking variables are controlled (i.e., random assignment to treatments).
 How can we establish causation when we cannot randomly assign subjects to conditions (e.g., studying effects of smoking)?
 Many, many, many studies, each undertaken in under different conditions, and all/nearly all telling a similar story.
 How can we model causation?
 Display causation models from chapt 2 (causation, common response and confounding)
 How would we model our prediction of college GPA for computer science students?
 Newsweek article on discussion board
 "All too many put too much credence in observational studies, in which people who happen to behave one way (eating a lot of olive oil, drinking in moderation) have one health outcome, while people who choose to behave the opposite way have a different health outcome."
 Display The Science News Cycle comic
 A recent study suggested that children who have older siblings who have autism are more likely to be diagnosed with autism. We wouldn't conclude causation....but yet
 a news story last year had the headline "Highstress jobs increase women's heart attack risk"....seems to be a causal conclusion from what can only be an observational study.
 Class assignment: Pair up and find an example of a possible or tempting causation statement that does not have a logical basis or rely on adequate evidence.
Simple linear regression (10.2)
Before beginning, draw scatterplot on the board, for ongoing reference. Include a least squares line and a line for ybar.
 We've discussed how the response y, for each individual x value, can vary. We can allocate parts of this variation to different sources. What is the framework for understanding the sources of variation in regression?
 DATA = FIT + RESIDUAL
 Display image of y variation (normal curve) for different x values.
 What other terms can we use to describe Fit?
 model, regression
 What other terms can we use to describe residual?
 errordeviation from the line
 what "error" is included in this residual? (sampling only)
 For a single x y pair...
 pick a single point on the scatterplot drawing and indicate each deviation as a vertical distance
 write each deviation formula under data = fit + model
 What is the total variation in y, represented by the DATA portion of the framework?
 , y observation minus the mean of y
 What is the variation due to differences in x (FIT), the part that knowing the regression line will determine?
 , fitted value of y minus the mean of y
 What is the variation due to the particulars of the individual observation (RESIDUAL)?
 , y observation minus the fitted value of y
 How do we summarize this partitioning of variation across all observations of y?
 calculate the sum of squares
 add and ^2 elements to each element
 What do we call each of these sum of squares elements?
 SST = SSM + SSE (total = model + error)
 Sometimes we use the terms explained variation and unexplained variation. How are these terms aligned with our model?
 explained variation = FIT
 unexplained variation = RESIDUAL (error)
 Remember that r^{2} is the fraction of the variation in the values of y explained by the least squares regression of y on x. How can we use some of these new ideas about sums of squares to define r^{2} mathematically?
 How do you think analysis of variance (ANOVA) can be applied to our model?
 analyze these sources of variation, comparing fit with residual
 Remember how , the denominator can be thought of as the degrees of freedom. What are these the degrees of freedom for?
 Degrees of freedom total (DFT), numerator is SST.
 How can we relate the degrees of freedom for the model (DFM) and for the error (DFE) to the degrees of freedom for the total (DFT)?
 DFT = DFM + DFE
 What does DFM equal? Why?
 1, because one explanatory variable, x.
 What does DFE equal?
 all the rest...DFT  DFM = n1  1 = n2
 What is a mean square (MS)?
 average squared deviation
 s^{2}_{y} is MST
 What is MSM?
 mean square model
 What is MSE?
 mean square error
 Using the ANOVA F test, we can test for whether y is linearly related to x (Ho:β1 = 0 ). An F test is a ratio of variation due to the model over the variation due to error. How should we set up this ratio?
 F = MSM/MSE
 The result is a number that says that the variation explained by the model is "F" times bigger than the unexplained "error" variation.
 When Ho is true, how is the F statistic distributed?
 an F distribution
 Like t, F is a family of distributions.
 How do we specify the degrees of freedom?
 using the DF of the numerator and denominator: F(1, n2)
 Display image of ANOVA F test from IPS6e
 How does the F test compare with the t test for Ho:β_{1} = 0?
 yield the same pvalue: t^{2}=F.
 t test allows a twosided alternative (more powerful)
 What happens to the MSM, MSE, and F when Ha:β_{1} ≠ 0 is true.
 MSM is large relative to MSE, resulting in large F statistic.
 We organize the elements that contribute to the ANOVA in an ANOVA table.
 Display ANOVA table
 Review each element
 Display ANOVA table for science score regressed on math score
 Review each element
Simple linear regression (10.1)
Before beginning, draw scatterplot on the board, for ongoing reference
 We have two quantitative (interval) variables. How can we visualize the data for these two variables?
 scatterplot
 What can you say about this scatterplot?
 display hsb mathscience scatterplot (national sample of high school seniors in 1980)
 moderate positive relationship, outliers?
 What else would we like to know about this relationship?
 correlation, least squares regression line
 Display scatter plot with fitted line, R^{2}
 Discuss meaning of R^{2}, amount of variation in y explained by least squares regression of y on x, and how correlation, r, relates.
 What would we like to know about the line?
 equation...
 mathscience example
 equation...
 Our scatterplot represents a sample. A different sample > different plot. What are we estimating with this sample?
 This is simple linear regression. What do simple and linear refer to?
 Simple: only one explanatory variable (x)
 Linear: the underlying relationship between x and y is linear
 In the population regression equation, what does μ_{y} signify?
 For each value of x, there is a distribution of y scores and μ_{y} is the mean of that distribution.
 We can think of each value of x as representing a subpopulation...all of the individuals who scored a particular value on the math test.
 We assume the means, μ_{y}, lie on a straight line when plotted against x.
 Display statistical model for linear regression image...along with
 What assumptions are made about the observed values of the response variable (y), for a given value of the explanatory variable (x)?
 observed y values are Normally distributed with standard deviation, σ.
 these Normal distributions all have the same standard deviation...equal variance of y
 So, the observed responses y vary about their means. How do we model estimation of the population regression line from sample data?
 Data = ( Fit) + (Residual)
 ε_{i} are independent and Normally distributed N(0,σ).
 a response y is the sum of its mean and a chance deviation, ε.
 What are the unknown parameters of the regression model?
 Do we have a method for estimating β_{0} and β_{1}, the "Fit" part of the model?
 Least squares regression
 What is yhat in the population regression model?
 μ_{y}
 Using our data, we calculate our estimates, b_{0} and b_{1}.
 How do we estimate the residual, ε_{i}, in Data = Fit + Residual?
 The observed residual,
 e_{i} sum to 0
 The observed residual,
 The remaining unknown parameter in our model is σ, the variation of y about the population regression line. We will estimate σ by s, the regression standard error. What do we use to estimate s?
 residuals, e_{i}
 What conditions are required to safely use regression to make inferences about the population?
 The observations are independent.
 The relationship between explanatory and response variables is linear.
 The standard deviation of y, σ, is the same for all values of x.
 The response y varies normally around its mean. (large n will compensate)
 Show graph of university GPA by HS GPA. What condition does this relationship fail to meet?
 How can we decide if the conditions hold?
 Study residuals
 Display example residuals showing normal, nonlinear, σ varies with x
 Display graph of residuals for science score regressed on math score
 If residuals are scattered randomly around 0 with uniform variation, it indicates that the data fit a linear model, have normally distributed residuals for each value of x, and constant standard deviation σ.
 Display normal quantile plot
 We have our regression equation. How can we be sure that the equation is better than using ybar to estimate μ_{y}, for each given x value?
 Estimating β_{0} and β_{1} is a case of onesample inference with unknown population standard deviation.
 We rely on the t distribution, with n – 2 degrees of freedom.
 What is the standard form of the confidence interval?
 estimate +/ t*SE(estimate)
 What is the standard form for calculating the t statistic?
 t = (estimate  hypothesis)/SE(estimate), distributed t(n2)
 Let's start with b_{1}. What are the hypotheses for testing significance of b_{1}?
 Ho:β_{1} = 0
 Display picture of Ha
 Conceptually, what are we looking for when we test Ho:β_{1} = 0?
 evidence of a significant relationship between variables x and y in the population from which our data were drawn.
 Remember the formula for b_{1}, what else are we testing when we test Ho:β_{1} = 0?
 also tests the hypothesis of no correlation between x and y in the population
 What about b_{0}? What does it mean, conceptually? Is a test of significance, Ho:β_{0} = 0, meaningful?
 No.
 Review SPSS output to find test values
 Display regression output for science score regressed on math score
 Identify slope/intercept values, t values, p values, and confidence interval
 There are two more population parameters of interest, μ_{y} (mean of y value for a given x value) and y (an individual y value for a given x value). What do we use to estimate the value for each of these?
 and (which are equal)
 How can we use our knowledge of a person's math score to predict his/her science score?
 display SPSS output showing b_{0} and b_{1} for science regressed on math.
 What is the confidence interval for each (called a prediction interval for yhat)?
 , where t* is critical value in t(n2) distribution
 , where t* is critical value in t(n2) distribution
 How does estimation of a confidence interval for each of these differ? Why?
 confidence interval for μhat_{y} is smaller than yhat
 we can be more confident when predicting a mean than when predicting an individual value.
 Display combined CI for μhat_{y} and yhat for fludeaths regressed on flucases.
 The true value of the population mean μ_{y} at a given value of x, will be within our confidence interval in C% of all intervals calculated from many different random samples.
 The prediction interval contains C% of all the individual values taken by y at a particular value of x.
 The prediction interval represents mainly the error from the normal distribution of the residuals ε_{i}.
 In SPSS, you can create (save) the predicted values and CI values for each x in the dataset.
 Doctor's office graph of children's height and weight. What is this
 99% prediction intervals for the height (above) and weight (below) of US male children ages 3 to 18.
Analysis of twoway tables (9.1/9.2)
 What kind of variables are assigned to explanatory and response in a comparison of two proportions?
 both categorical (explanatory: two groups, response: yes/no)
 But what if we have more than two groups. How can we compare three or more proportions? What if the response variable has more than two outcomes? (e.g., NJASK: partially proficient, proficient, advanced proficient) What exploratory data analysis method can we use to examine two categorical variables?
 we can analyze the twoway table of counts (crosstabulation, contingency table)
 Let's start with two proportions: 4th grade and 8th grade proportion "passed": phat(4th) = .8 and phat(8th) = .667. How do we convert these to a twoway table?
 set up table with two columns  4th and 8th (explanatory)...add proportions under each heading
 What are the categories of the response variable?
 label the "yes" row; add another row on the table for the "no" and fill in with proportions
 Let's say we want to work with counts rather than proportions. How can we convert the proportions to counts?
 multiply by n (for the example n_{1}=100, n_{2}=120) to convert to a count.
 Convert the table rows to counts as follows
4th  8th  

Pass  80  80 
Fail  20  40 
100  120 
 What are the marginal distributions on a twoway table?
 add the row/col marginals
 Could we add another group to this table, say 12th graders? Could we add another response category, say borderline pass?
 sure, just another column and/or another row (but don't actually change the ex. table)
 What comparison do we want to make?
 go back to the two group proportions, the passing rate in 4th grade as compared to 8th grade
 How can we understand this comparison in our twoway table?
 conditional distributions
 What conditional distributions are we interested in?
 when setting up the table, put the explanatory variable (if there is one) in the columns, then condition on the columns.
 P(pass4th) = P(pass and 4th)/P(4th)
 P(pass8th) = P(pass and 8th)/P(8th)
 add column percents to table on the board
 In a crosstabulation, SPSS computes 3 kinds of cellwise probabilities/percents. In the following picture, which is which?
 display screenprint of SPSS crosstabs
 identify how each percentage is created
 What comparison are we interested in testing in this situation?
 difference in pass rate across groups (same as two proportions)
 How do we generalize this to the twoway table situation where we may have more categories in explanatory/response variables?
 Ho: there is no association between the explanatory and response variables; the two variables are independent
 Ha: there is an association between the explanatory and response variables; the two variables are dependent
 The null hypothesis is saying that if there's nothing going on, we expect the distributions for each value of the explanatory (each population represented) to be the same
 With what can we compare our observed counts to test whether they are different enough across columns?
 Expected counts tell us what the count would be if there's no association between explan and resp variables.
 How do we calculate the expected count?
 What does the expected count mean?
 total row percent * column n = nrow * ncol/totn
 display example crosstabs and work it out
 What are the expected counts for the status*grade example?
 Have the students work these out and write in the table on the board
 Note only one cell needs to be calculated from formula, the rest can be obtained by subtraction
 Note that expected counts don't need to be integers (whole numbers)
 Display expected counts in crosstabs
 Show how expected counts reflect the situation where P(pass) = P(pass4th) = P(pass8th)..calculate proportion using expected counts
 What statistic do we use to test whether or not there is an association between the explanatory and response variables...whether they are independent or dependent?
 chisquare (Χ^{2})
 What do we mean by independent?
 in our example, knowing what grade the student is in gives us no additional information about the passing rate beyond what we know about the overall passing rate
 How do we compute the the X^{2} (chisquare) statistic?
 We are interested to know if the observed is quite different from the expected. How does the chisquare tell us that?
 as the difference between observed and expected increases, the chisquare value increases
 How do we decide if the value is big enough?
 would like a pvalue
 If there is no association between the row and column variables, how will the chisquare statistic be distributed?
 according to a χ^{2} distribution
 the χ^{2} distribution is a family of distributions (like the tdistribution), depending on the degrees of freedom
 display image of a few χ^{2} distributions
 Under what conditions is it safe to use the chisquare test for a twoway table?
 The samples are simple random samples (SRS).
 All individual expected counts are 1 or more (≥1)
 No more than 20% of expected counts are less than 5 (< 5)
 For a 2x2 table, this implies that all four expected counts should be 5 or more.
 How does the chisquare test for twoway tables work?
 find the χ^{2} distribution with the correct degrees of freedom
 df = (r1)(c1)
 look for the
 this is our pvalue
 display the chisquare test for twoway tables image
 Can a chisquare test be onesided or twosided?
 No. Only interested in upper tail, there is no "less than" as any deviation from null makes the statistic bigger.
 What is the chisquare statistic for our status vs. grade example?
 Display SPSS results
 pvalue = .027..we'd reject when alpha = .05
 note that pvalue can be obtained from Table F...adequate, but chisquare calculators are easy to use
 and conclude that pass/fail status is related to grade
 Is independence vs. dependence all we can conclude?
 No. We need to say something about the nature of the relationship. Provide some percents.
 The data show that 67% of 8th graders pass as compared to 80% of 4th graders revealing a significant relationship between grade and pass/fail status.
 What can we use to help us interpret what's going on with a larger, more complex twoway table:
 it can be helpful to look at the contribution of each cell to the chi square statistic.
 ask which cells are contributing the most "difference" in example on p. 541 in text (cells with counts 1 and 19)
 Can we conclude that grade causes the difference in pass rate?
 No. All we can say right now is that grade explains it. And since we can't randomly assign students to grades, we cannot do a more definitive experiment.
 Would we have come to the same conclusion if we had calculated a z test comparing the pass proportions in the two groups?
 Yes.
 A chisquare statistic is equal to the square of the z statistic.
 Review how to run crosstabulation and chisquare in SPSS.
Inference for proportions (8.1/8.2)
 In what situations does it make sense to study the population proportion?
 when we have a categorical response variable, such that we are counting membership (successes) in each category. Example: what proportion of students bring their lunch to school?
 We draw a sample from the population. How is our data recorded?
 draw the population of subjects, and sample of X's that result from sampling.
 record the sample responses as 1 (success) or 0 (failure) for each individual in the sample. (Draw table with response for each student.) Add it up: X = 1 + 0 + 0 + 0 + 1 + 1 + 0 + ... + 1 + 0
 What is the point estimator for population proportion, p?
 the sample proportion of successes
 How is the sample proportion related to the sample mean xbar?
 so the sampling distribution for the sample proportion is a special case of the sampling distribution of the mean, .
 If we sample from a large population (say 20 times larger than sample size), how will X (the number of successes) be distributed?
 B(n,p); according to a Binomial distribution with parameters n and p
 , note formula more often seen using k successes, rather than x.
 is related to X, the distribution of is related to the binomial distribution.
 But the binomial distribution is messy to work with, what can we use instead?
 normal approximation to the binomial when n is large
 when n is large both are approximately Normal.
 The binomial for X (the number of successes) has μ = np and σ = sqrt(np(1p)). How does this translate to the mean and sd for phat>?
 divide by N.... and
 But, the standard deviation uses p and we don't know p. What do we substitute?
 , and change the name to standard error...
confidence interval for single proportion
 What is the general form of a confidence interval?
 estimate +/ margin of error
 What do we need to create the margin of error?
 multiplier? (z*...1.645, 1.960, and 2.576 at the 90%, 95% and 99% confidence levels)
 se of sampling distribution for phat
 What are the conditions required for safely using this confidence interval?
 Need to ensure that sample size is large enough to assume that sampling distribution of is Normal.
 number of successes () and number of failures () are both 15 or greater.
 population is at least 20 times as large as sample.
 What kind of error is included in this calculation?
 sampling error only, errors in data collection (nonresponse, lack of accuracy) are not included and can be much more serious than sampling error.
 If we want to be more confident that the interval contains the population parameter, what do we have to 'give' on?
 precision, narrowness of interval, so we increase our percent confidence
 What else can we do to increase precision, for a fixed level of confidence?
 increase sample size
 How can we use the formula for margin of error to figure out how large n should be (for a given margin of error)?
 solve for n, such that
 What practical problem arises when calculating a desired sample size, given a confidence level and desired margin of error?
 formula uses phat, but that's what we want to estimate with the sample...
 How can we overcome this problem?
 use a value from a pilot study or use a conservative value for phat...one that will make the largest standard error...this is always phat=.5...have the students confirm that this is true
 What is the formula for the conservative estimate of n given m (margin of error)?
 note that when z* is 1.96 (95% confidence), the result is n = 1/m^2, which for 3% margin of error is about 1000.
significance test for single proportion
 What is the null hypothesis for this test?
 Ho: p = p_{0}
 So now we have an estimate for p that we can use rather than .
 What test statistic can we use to compare phat with p_{0}?
 What are the possible Ha?
 Ha: p < > ≠ p_{0} (show image of P(Z>=z) for each case)
 What are the conditions needed to safely use this test?
 expected number of successes, np_{0}, and the expected number of failures, n(1 − p_{0}) are both at least 10.
 population is at least 20 times as large as sample
comparing two proportions
 How do we think about two populations?
 Fill out the table below
Population  Pop prop  Sample size  Count of successes  Sample prop 

1  p_{1}  n_{1}  X_{1}  
2  p_{2}  n_{2}  X_{2} 
 when both samples are large, distribution of D is approximately Normal
 How do use all of this to do a confidence interval for a comparison of two proportions?
 , where m = z * SE_{D}
 What are the conditions needed to safely use this confidence interval?
 number of successes, , and the number of failures, , in both samples, are both at least 10, to assure that distribution of D is Normal
 population is at least 20 times as large as the samples
 samples are independent
 What is the null hypothesis used to test the difference in proportions?
 Ho: p_{1} = p_{2}
 Looking at our SE(D), how can we revise it to reflect our null hypothesis that p_{1} = p_{2}
 Devise a pooled estimate of p, which we'll call
 So
 What are the explanatory and response variables in the test for the difference of two proportions?
 both are categorical  explanatory defines the two populations, response is a yes/no on a particular question
 create a two way table as a prelude to X^{2}
Examples^{[1]}
 Insurance companies are interested in knowing the population percent of drivers who always buckle up before riding in a car.
 a. When designing a study to determine this population proportion, what is the minimum number you would need to survey to be 95% confident that the population proportion is estimated to within 0.03?
 Ans:1068
 b. If it was later determined that it was important to be more than 95% confident and a new survey was commissioned, how would that affect the minimum number you would need to survey? Why?
 Need an even larger sample size. z* increases, but everything else stays the same.
 Suppose that the insurance companies did do a survey. They randomly surveyed 400 drivers and found that 320 claimed to always buckle up. We are interested in the population proportion of drivers who claim to always buckle up.
 What is the sample proportion
 Is it safe to construct a confidence interval?
 yes,
 number of successes and failures are both > 15
 population is more than 20 times the sample size
 samples are independent
 Construct a 95% confidence interval for the population proportion that claim to always buckle up.
 (.76, .84)
 What is the sample proportion
 Two types of medication for hives are being tested to determine if there is a difference in the percentage of adult patient reactions. Twenty out of a random sample of 200 adults given medication A still had hives 30 minutes after taking the medication. Twelve out of another random sample of 200 adults given medication B still had hives 30 minutes after taking the medication. Test at a 1% level of significance.
 What test will we use?
 Significance test for comparing two proportions
 What is the random variable?
 difference in the percentages of adult patients, taking medication A as compared to medication B, who still had hives after 30 minutes.
 What are the hypotheses to be tested?
 Ho: or Ho:
 Ha: or Ho:
 What are and ?
 Are the conditions met such that we can safely use the test?
 yes,
 samples are independent
 population is large
 np_{A}=20, np_{B}=12, the failures for both are large
 What is , the pooled estimate of p?
 What is the SE_{Dp}?
 What is the z statistic?
 What is the pvalue?
 display the normal calculator...pvalue = .14
 What decision do we make and what is our conclusion?
 fail to reject Ho, not enough evidence to support that difference in med A and med B is NOT 0.
 draw normal curve and colorin two .07 proportions on each end of curve.
 What test will we use?
References
 ↑ Dean, S., & Illowsky, B. (2009, February 18). Confidence Intervals: Homework and Comparing Two Independent Population Proportions. Retrieved from the Connexions web site on 5 Oct 2010.
Matched pairs (part of 7.1)
 What are the two ways to create a matched pairs design?
 observations are paired by subjecttwo measurements per subject, testretest
 observations are natural pairstwins, spouses, siblings, matching on ability
 What do we mean by a dependent groups design vs. an independent groups design?
 Note that it's not the research question that drives the decision as to which method, it's the study design.
 Pair up with the student next to you. Take a few minutes to come up with an example of a matched pairs design and a corresponding independent groups design. (Don't worry about whether it's actually doable.)
 Have each group share their design.
 Why do we treat this design differently than an independent groups design?
 the between subjects variation is controlled by using the differences within subjects. Each subject serves as their own control. Eliminates other confounding factors (ability, age, knowledge...) which occur btwn subjects.
 Draw the two populations for independent groups leading to sampling distribution of mean differences, , compared to one population of mean differences (matched pairs) leading to sampling distribution of differences, .
 How does a matched pairs sample become a special case of the onesample ttest?
 We can take the difference between the two measures for each individual; this difference is then compared with no difference.
 We have one standard deviation, s, and one standard error, .
 If we are in the onesample situation, does the matched pairs design have an explanatory and response variable?
 The explanatory variable is the categorical variable that describes the two conditions/"populations".
 The response variable is the quantitative variable that is measured in each of the two conditions.
 What do we test in the matched pairs ttest?
 Draw two populations (to represent the two conditions)
 Ho: μ_{1} = μ_{2} > μ_{d} = 0
 Ha: μ_{1} >, <, ≠ μ_{2} > μ_{d} >, <, ≠ 0
 Note: in learning about using a ttest with two groups we have specified a null value, but in fact this value doesn't have to be 0. It can be any expected value. 0 is the usual case.
 What is μ_{d}?
 the mean of the differences between paired observations in sample 1 and sample 2...x(1)  y(1), x(2)  y(2). (display oli picture showing each pair of observations converted to differences)
 What conditions must be met in order to use the matched pairs ttest?
 sample of differences is randomly obtained
 sample size is large or population of differences varies normally
 For small samples, how do we confirm that population distribution is normal?
 check a histogram and/or Normal quantile plot (convert each difference to a percentile, determine zscore for that percentile, plot the difference score against the zscore, should result in a straight line, p. 68 in text)
 What test statistic is used for the matched pairs ttest?
 Note: this is the one sample ttest, df = n1
 What is the confidence interval for μ_{1}  μ_{2}, μ_{d}?
 There are various names for a matched pairs ttest.
 Paired samples ttest or just paired ttest
 Correlated ttest or correlated pairs design
 Dependent sample ttest
 Does it matter how the difference is set up?
 Example: We want to determine if a relaxation exercise lowers anxiety level.To test the effectiveness of the relaxation exercise, 10 individuals were recruited and their preexercise and postexercise anxiety levels were measured. The differences in scores were analyzed using a matched pairs ttest.
 We think the post anxiety will be lower than pre. How shall we set up the difference? (d = pre  post)
 Hypotheses: Ho: μ_{d} = 0; Ha: μ_{d} > 0
 Two ways to run a matched pairs ttest
 Review instructions on transform data method, onesample ttest vs paired samples ttest
Additional topics: type I and type II errors and power
 What are the two types of errors associated with hypothesis testing?
 Type I and Type II
 Display twoway table (reality vs. decision)
 Reject Ho when it is true (false positive) > Type I
 Retain Ho when it is false (false negative) > Type II
 What is the probability of a Type I error?
 α = .05 or .01 (whatever we set it at)
 How does a Type I error relate to the sampling distribution?
 Normal population, gives rise to sampling distribution of the mean
 Mean of distribution is pop mean, for a twosided test, determine mean value corresponding to p=.025 in each tail
 We will reject whenever the mean is in this range, even when Ho is true. (False positive)
 Why don't we minimize α to be very small (minimize false positives)?
 Makes it harder to reject Ho.
 This is the other error (Type II, false negative) failure to reject (retention) of Ho even when it is false.
 What do we call the ability to reject Ho when it is false.
 Power
 Probability of Type II error = 1  power
 How do we get more power?
 everything else being equal, larger sample size
 important to know ahead of time what sample size needed to achieve certain level of power
 If we fail to reject Ho, we want to do so because Ho is true, not for lack of power.
 What happens when we use the z distribution when a t distribution is the correct distribution to use?
 display graph including overlay of z and t distributions.
 the α level may be larger than specified.
 unknowingly have a larger Type I error
7.2 Comparing two means
 For this new test we are going to compare two means. How do we think about this situation with respect to populations?
 Draw two populations; the mean of a particular variable for each distinct population is represented as μ_{1} and μ_{2}.
 We want to test whether the two population means are different.
 When we draw the two samples, one from each of the two populations, what must we be careful to do?
 The two samples must be independent
 If we are going to compare two means, we need two variables. How do we describe/classify these two variables?
 Explanatory variable which is categorical (a grouping variable)
 Response variable which is quantitative (provides scores/data which are summarized as a mean)
 How many values does the explanatory variable have?
 2
 What is the null hypothesis for this test? What does it mean in words?
 Ho: μ_{1}  μ_{2} = 0
 OR
 Ho: μ_{1} = μ_{2}
 What are the possible alternative hypotheses? What does each mean?
 Ha: μ_{1}  μ_{2} ≠ 0 ...(Ha: μ_{1} ≠ μ_{2})
 Ha: μ_{1}  μ_{2} < 0 ...(Ha: μ_{1} < μ_{2})
 Ha: μ_{1}  μ_{2} > 0 ...(Ha: μ_{1} > μ_{2})
 (discuss which mean is greater for the onesided alternatives)
 What is the population parameter for which we are doing hypothesis testing?
 the difference between the means, μ_{1}  μ_{2}
 this means that we have a sampling distribution of differences...if both population distributions are normal, then sampling distribution of differences is also normal
 draw sampling distribution of differences
 What is the null value?
 0
 What are the conditions (also called assumptions) necessary for use of the independent samples t statistic (t test)?
 each sample is SRS from population
 the two samples are independent...each value is sampled independently from each other value.
 the distribution of the response variable in both populations is normal
 some procedures require an "equal variances" assumption. We will use a more general procedure that doesn't make this this assumption.
 What is the general structure of the t test?
 (put up onesample ttest formula, if needed: )
 What is the formula for the twosample t statistic?
 review why this statistic makes sense....element by element,
 y_{1} and y_{2} estimate μ_{1} and μ_{2}, so estimates μ_{1}  μ_{2}
 the null value is missing from the equation.
 the denominator is the standard error of
 What does the value of the t statistic tell us?
 measures (in standard errors) the difference between what the data tell me about the parameter of interest μ_{1}  μ_{2} (sample estimate) and what the null hypothesis claims that it is (null value).
 What distribution is used to calculate the pvalue?
 The null distribution approximates the t distribution with the appropriate degrees of freedom. It's not exact, but good enough for our purposes. Let statistical software calculate the df.
 How do we use the pvalue in hypothesis testing?
 the pvalue indicates amount of evidence against Ho; pvalues less than the alpha threshold provide strong evidence against Ho and in favor of the specified alternative.
 Why should Ha be set before doing the study and looking at the data?
 easier to reject Ho with onesided alternative, but it is be wrong to set it after seeing the data leans in that direction. Contributes to error. Which errorany thoughts?
 How often do we falsely accept Ha, when in fact Ho is true?
 This is the alpha level...Type I error.
 What if obtained 20 different samples and did 20 ttests using .05 alpha, when in fact Ho is true. For how many might we reject Ho, according to probability? (1)
 What does a 95% confidence interval for tell us?
 we are 95% confident that the actual value of μ_{1}  μ_{2} occurs in this range.
 when Ho is rejected, the confidence interval quantifies the supposed effect of the explanatory variable on the response variable.
 How is the confidence interval calculated?
 Some further considerations
 The method described here does NOT assume variances are equal. The pooled t test is used in this case; described later in chapter, we won't be using this method.
 The general method described here is robust to violations of Normality when
 sample sizes are large (n_{1}+n_{2} > 40)
 sample size in each group is equal and shape of population distributions for each group are similar
 the routine will ask us to label one sample as "group 1" and the other as "group 2", how do we decide? Doesn't matter as long as Ha for onesided test matches.
 small samples may be useful when effect size is large. If borderline, study can't say much, not enough power.
7.1 Inference for the population mean
 What is the name of the theorem that says that when n is large the sampling distribution is N(), regardless of the shape of the starting distribution? (central limit theorem)
 Click on the link for Sampling Distribution applet. Create a crazy population distribution  highly skewed with significant outliers. Set samples to N=5 and N=25, run simulation.
 Discuss idea that when we look up the pvalue for a z test we are assuming that the distribution of means is shaped like the z distribution.
 Draw a normal distribution and shade a possible pvalue. Compare this area to the area for the distribution created for N=5.
 What is the z test for the population mean?

 note there are two population values μ_{0} and σ.
 the distribution of this statistic is normal and is derived from the sampling distribution of
 What do we call the standard deviation of a statistic (e.g., a mean) when it is estimated from the data?
 standard error
 draw a normal distribution of 's; the standard error is the standard deviation of the distribution of sample means
 How is the standard error of a statistic different from the standard deviation of a statistic (e.g., SE_{Xbar} vs. SD_{Xbar})?
 standard deviation of a statistic uses the population value...
 standard error of a statistic uses the value calculated from the sample...
 When σ is unknown and we are forced to use , can we go ahead to use the z test anyway, replacing σ with s?
 NO!! When s replaces σ we now have a t statistic
 What is the one sample t statistic?

 the t statistic has a t distribution with n1 degrees of freedom
 degrees of freedom can be a difficult concept and difficult to determine; basically it's the number of independent pieces of information that go into the estimate of a parameter (in this case the t statistic)
 How do we denote a particular t distribution?
 t(k), where k = degrees of freedom
 How is a z distribution similar to a t distribution?
 symmetric, centered at 0, covers  to
 show figure comparing t(2), t(5), and z (note that t(30) ~ z)
 show figure comparing a z score and t score  review differences that result in larger spread
 How do we use the t statistic in hypothesis testing?
 the t statistic is the standardized score for assuming Ho is true, μ = μ_{0}.
 the t statistic follows the t distribution, so we can calculate the t statistic and then use the distribution to determine the pvalue (the likelihood of obtaining that value, or a larger one, of t)
 What are the conditions (also called assumptions) necessary for use of the t test?
 the sample is random
 population distribution is normal, well this is hard to know for sure.
 show table of sample size vs. normality of population distribution
 How do we decide if the population is Normal?
 Look at the data for evidence.
 Given SRS of size n, drawn from a population having unknown mean μ and given Ho: μ = μ0, how do we use the t statistic to test Ho?
 explain that the population with the unknown mean is the one from which the sample is actually drawn, NOT the one that is the usual case, the one with μ0. We are testing to see if the population from which the sample is drawn is different from the usual (null) population.
 review pvalue probability formulas and pictures of tdistributions with pvalues shaded for each version of Ha. example graphics
 What is the confidence interval for an estimate of a population mean, , when σ is unknown?
 What part is called the margin of error?
 What is t* for a 95% confidence interval given 15 df?
 Table D in textbook lists these values for a selection of t distributions
 Review how to use Table D to obtain t*
 Note that as df gets larger (n gets larger), the t values approach z.
 What violations to the conditions for use of the t test are of concern?
 The t test is fairly robust, mall deviations from normality – the results will not be affected too much. Factors that strongly matter:
 Random sampling: the data must be a random sample from the population
 outliers and skewness: strongly influence the mean and therefore the t procedures. However, their impact diminishes as the sample size gets larger because of the Central Limit Theorem.
 The t test is fairly robust, mall deviations from normality – the results will not be affected too much. Factors that strongly matter:
 Sample size rules of thumb:
 for n> 40, the tstatistic will be valid, even with strong skewness (but you should still look at the data using exploratory data analysis tools)
 for 15 < n < 40, mild skewness is acceptable but not outliers
 for n < 15, only use t test if sample distribution is close to Normal and without outliers