Sampling/Self-check assessment

Use the following quiz questions to check your understanding of simple linear regression. Note that as soon as you have indicated your response, the question is scored and feedback is provided. As feedback is provided for each option, you may find it useful to try all of the responses (both correct and incorrect) to read the feedback, as a way to better understand the concept.

Sampling design
• Which of the following is a random sample of a college student body?[1] (check all that apply)
• Every fifth person coming out of the Campus Center between 8:30am and 10:00am
• That's not quite right. Selecting every fifth person encountered is convenience sampling, which does not use randomization to control for biases. Try again.
• Lisa Meyer, Todd Jones, and Maria Rivera, whose ID numbers were the first three on a listed sorted by an assigned random number
• That's correct. Even a sample of only 3 individuals can be a random sample, if obtained using a a randomization procedure, as each individual in the population has an equal chance of being selected for the sample.
• Every 20th person in the student directory.
• That's not quite right. In a random sample, each individual in the population has an equal chance of being selected for the sample. In this sample, the 1-19 listings, for example, have no chance of being selected. Try again.
• The set of students who respond to an online questionnaire, advertised at the top of the weekly email news bulletin
• That's not quite right. This is a voluntary response sample, consisting of people who choose themselves by responding to the emailed request; it does not use randomization to control for biases. People with strong opinions are more likely to respond. Try again.
• In a large midwestern university with 30 different departments, the university is considering eliminating standardized scores from their admission requirements. The university wants to find out whether the students agree with this plan. They decided to randomly select 100 students from each department, send them a survey, and follow up with a phone call if they do not return the survey within a week. What kind of sampling plan did they use?[2]
• Simple random sampling
• That's not quite right. Note that the population was divided into groups (departments). Try again.
• Stratified random sampling
• That's correct. The population was divided into groups (departments) and then 100 individuals (students) are selected from each group. As we suspect that each department has students who are particular to that department in some way, one way to ensure that we have a representative sample of students is to select a set of students from each department.
• Cluster sampling
• That's not quite right. In cluster sampling whole groups are chosen randomly and then all individuals in the group are included in the sample. In this research, 100 students were randomly sampled from each group (department). Try again.
• Multi-stage sampling
• That's not quite right. Multi-stage sampling combines a number of sampling approaches into a multi-step process. The sampling plan for this research only involved dividing the population into groups (departments), from which the sample was selected. Try again.

Bias
• A biased sample is one that...[3] (check all that apply)
• is too small.
• That's not quite right. Bias has nothing to do with sample size. A small sample may, by chance, be non-representative of the population. However, bias refers only to systematic differences. Try again.
• will always lead to a wrong conclusion.
• That's not quite right. There are times when the bias is small and the correct conclusion is likely to be reached despite the bias. Try again.
• will likely have certain groups from the population over-represented or under-represented due only to chance factors.
• That's not quite right. If the over- or under-representation is due to chance then the sample is not biased. Unbiased samples can be non-representative, especially if the sample size is small. Try again.
• will likely have groups from the population over-represented or under-represented due to systematic sampling factors.
• That's correct. Only when the sampling is systematically favoring one group or another is the sample biased. Random samples, although they can be different from the population, are not biased. Bias is defined as the procedure for drawing the sample, not by the result.
• is likely to be a good and useful sample.
• That's not quite right. Even an unbiased sample may not be good or useful. If the sample size is small then it is likely that even an unbiased sample will, by chance. over-represent some groups and under-represent others. Try again.
• A researcher does a survey randomly calling phone numbers which are land lines. People who only have cell phones are not sampled.[4] Which of the following best accounts for the potential bias in the resulting sample?
• convenience sampling
• That's not quite right. Convenience sampling makes no attempt to randomly sample the population. Consider how this researcher included only part of the population in his sampling plan. Try again.
• non-response
• That's not quite right. Non-response impacts a sample when a selected individual is not included in the sample due to uncooperation or inability to contact. Consider how this researcher included only part of the population in his sampling plan. Try again.
• voluntary response sample
• That's not quite right. Voluntary response is when the respondents themselves choose to be in the sample. Consider how this researcher included only part of the population in his sampling plan. Try again.
• undercoverage
• That's correct. This is undercoverage bias because those with only cell phones are part of the population but have no chance of being included in the sample.
• A radio station asks readers to phone in their choice in a daily poll.[5] Which of the following best accounts for the potential bias in the resulting sample?
• response bias
• That's not quite right. Response bias results from unintended behaviors of the respondent (e.g., lying) or the interviewer (e.g., physical appearance). Consider how this survey invites respondents to participate. Try again.
• non-response
• That's not quite right. Non-response impacts a sample when a selected individual is not included in the sample due to uncooperation or inability to contact. Consider how this survey invites respondents to participate. Try again.
• voluntary response sample
• That's correct. Voluntary response is when the respondents themselves choose to be in the sample. Those with strong feelings are much more likely to respond.
• undercoverage
• That's not quite right. Undercoverage bias is when part of the population have no chance of being included in the sample. Consider how this survey invites respondents to participate. Try again.

Statistical inference
• Using a simple random sample, a researcher may correctly calculate the population parameter.
• True
• That's not quite right. A sample is used to estimate the population parameter. A simple random sample will reduces the threat of bias; a large sample size increases the likelihood that the value of the sample statistic is close to the value of the true population parameter.
• False
• That's correct. A sample is used to estimate the population parameter. A simple random sample will reduces the threat of bias; a large sample size increases the likelihood that the value of the sample statistic is close to the value of the true population parameter.
• Which of the following is (are) true? (check all that apply) Using a random sample...[6]
• ...is to accept some uncertainty about the conclusions.
• That's correct. A random sample is one of many possible samples which could have been chosen (but weren't) and may, by chance, over- or under-represent the population in some way.
• That's not quite right. Random sampling does not produce bias, which means systematic rather than random error.
• ...enables you to calculate statistics.
• That's correct. A statistic is a number which describes a sample.
• ...is to risk drawing the wrong conclusions about the population.
• That's correct. A random sample is one of many possible samples which could have been chosen (but weren't) and may, by chance, over- or under-represent the population in some way, resulting in an incorrect conclusion.

Statistical inference

A research scientist is interested in studying the experiences of twins raised together versus those raised apart. She obtains a list of twins from the National Twin Registry, and selects two subsets of individuals for her study. First, she chooses all those in the registry whose last name begins with Z. Then she turns to all those whose last name begins with B. Because there are so many names that start with B, however, the researcher decides to incorporate only every other name into her sample. Finally, she mails out a survey and compares characteristics of twins raised apart versus together.[7]

• What is the population?
• The population consists of all twins recorded in the National Twin Registry. It is important that the researcher only make statistical generalizations to the twins on this list, not to all twins in the nation or world. That is, the National Twin Registry may not be representative of all twins.
• What is the sample?
• The twins in the National Twin Registry whose last names begin with Z and every other listing of twins whose last name begins with B.
• Was the sample picked by simple random sampling? Explain.
• No. The researcher used a systematic approach to obtaining a sample, rather than randomization.
• Is the sample biased? Why or why not?
• Very likely. Choosing only twins whose last names begin with Z does not give every individual an equal chance of being selected into the sample. Moreover, such a procedure risks over-representing ethnic groups with many surnames that begin with Z. There are other reasons why choosing just the Z's may bias the sample. Perhaps such people are more patient than average because they often find themselves at the end of the line! The same problem occurs with choosing twins whose last name begins with B. An additional problem for the B's is that the “every-other-one” procedure disallowed adjacent names on the B part of the list from being both selected. Just this defect alone means the sample was not formed through simple random sampling.

Notes

1. Adapted from Inferential Statistics at Online Statistics Education: An Interactive Multimedia Course of Study. Project Leader: David M. Lane, Rice University. Retrieved 21 October 2012.
2. Question adapted from Ebook Problems Prob Basics, Problem 1 in Probability and Statistics EBook, from UCLA Statistics Online Computational Resource (SOCR), Retrieved 14 October 2012.
3. Adapted from Inferential Statistics at Online Statistics Education: An Interactive Multimedia Course of Study. Project Leader: David M. Lane, Rice University. Retrieved 21 October 2012.
4. Adapted from Sampling Bias at Online Statistics Education: An Interactive Multimedia Course of Study. Project Leader: David M. Lane, Rice University. Retrieved 21 October 2012.
5. Adapted from Sampling Bias at Online Statistics Education: An Interactive Multimedia Course of Study. Project Leader: David M. Lane, Rice University. Retrieved 21 October 2012.
6. Adapted from Inferential Statistics at Online Statistics Education: An Interactive Multimedia Course of Study. Project Leader: David M. Lane, Rice University. Retrieved 21 October 2012.
7. Adapted from Inferential Statistics at Online Statistics Education: An Interactive Multimedia Course of Study. Project Leader: David M. Lane, Rice University. Retrieved 21 October 2012.