Sampling distributions/Self-check assessment

From WikiEducator
Jump to: navigation, search

Use the following quiz questions to check your understanding of random variables. Note that as soon as you have indicated your response, the question is scored and feedback is provided. As feedback is provided for each option, you may find it useful to try all of the responses (both correct and incorrect) to read the feedback, as a way to better understand the concept.

Sampling distributions of a sample mean

Icon qmark.gif
The mean and sd of [math]\bar{x}[/math]

In an article in the Journal of American Pediatric Health researchers claim that the weights of healthy babies born in the United States form a distribution that is nearly Normal with an average weight of 7.25 pounds and standard deviation of 1.75 pounds.[1] Suppose a researcher selects 50 random samples with 30 newborns in each sample.

  • What is the best estimate for the mean of the sample means?
    • 7.25
      • That's correct. The mean of the sample means is an unbiased estimate of the population mean; therefore, the best estimate of the mean of the sample means (i.e., the mean of the sampling distribution of the mean) is 7.25 pounds.
    • 1.75
      • That's not quite right. This is the population standard deviation. Consider that the mean of the sample means is an unbiased estimate of the population mean. Try again.
    • 50
      • That's not quite right. This is the number of samples. Consider that the mean of the sample means is an unbiased estimate of the population mean. Try again.
    • 30
      • That's not quite right. This is the size of each of the samples. Consider that the mean of the sample means is an unbiased estimate of the population mean. Try again.
    • unable to determine from the information provided
      • That's not quite right. There is enough information to estimate the mean of the sampling distribution of means. Try again.
  • What is the best estimate of the standard deviation of the sample means?
    • 1.75
      • That's not quite right. This is the population standard deviation. What is the standard deviation of the sampling distribution of the means? Recall that it varies depending on the sample size. Try again.
    • .32
      • That's correct. The standard deviation of the sample means is calculated by dividing the population mean by the square root of the sample size: σ/sqrt(n) = 10/sqrt(30)= 1.75/5.48= .32.
    • .25
      • That's not quite right. In the denominator of your calculation, you may have used the number of samples rather than the sample size. Try again.
    • unable to determine from the information provided
      • That's not quite right. There is enough information to determine the standard deviation of the sampling distribution of means. Try again.
  • If we randomly selected 30 newborns from the full population of US newborns, would you be surprised if their mean weight was 8.30 pounds?
    • Yes, a mean of 8.30 pounds would be surprising as this sample result is more than 3 standard deviations above the overall mean weight of 7.25.
      • That's correct. As the population is Normally distributed we know that the sampling distribution will be Normally distributed, regardless of sample size, so we can use the Standard Deviation Rule to evaluate the particular sample result. Three standard deviations above 7.25 is 7.25 + 3(1.75/sqrt(30)) = 8.21. So 8.30 pounds is more than 3 standard deviations above the mean. An observation in this region occurs only .15% of the time....a rare event indeed.
    • No, a mean of 8.30 pounds would not be surprising as this sample result is within 2 standard deviations of the overall mean weight of 7.25.
      • That's not quite right. Although 8.30 is only 1.05 pounds greater than 7.25, and random samples do result in some variability in sample means, to determine if 8.30 is surprising, you will need to calculate the standard deviation of the sampling distribution of means, and if appropriate, use the Standard Deviation rule to assess how likely it would be to observe a mean of 8.30 in this sampling distribution. Try again.
    • Yes, a mean of 8.30 pounds would be surprising because this sample result is 1.05 pounds greater than the overall mean weight of 7.25.
      • That's not quite right. It is true that 8.30 pounds is 1.05 pounds greater than 7.25 pounds, but we have to assess this difference relative to the standard deviation of the sampling distribution of sample means to determine if this sample mean is surprising. You will need to calculate the standard deviation of the sampling distribution of means, and if appropriate, use the Standard Deviation rule to assess how likely it would be to observe a mean of 8.30 in this sampling distribution. Try again.
    • No, a mean of 8.30 pounds would not be surprising because this sample result is only 1.05 pounds greater than the overall mean weight of 7.25, and random samples do result in some variability in sample means.
      • That's not quite right. It is true that 8.30 pounds is 1.05 pounds greater than 7.25 pounds, and we do expect some variability, but we have to assess this difference relative to the standard deviation of the sampling distribution of sample means to determine if this sample mean is surprising. You will need to calculate the standard deviation of the sampling distribution of means, and if appropriate, use the Standard Deviation rule to assess how likely it would be to observe a mean of 8.30 in this sampling distribution. Try again.



Icon qmark.gif
Central Limit Theorem
  • Regardless of the shape of the parent population, the sampling distribution of the mean approaches a normal distribution as sample size increases.[2]
    • True
      • That's correct. Although counter-intuitive, the sampling distribution of the mean approaches a normal distribution as sample size increases. This is an important part of the "Central Limit Theorem."
    • False
      • That's not quite right. Although counter-intuitive, the sampling distribution of the mean approaches a normal distribution as sample size increases. This is an important part of the "Central Limit Theorem."



Icon qmark.gif
Central Limit Theorem

In 2009 the mean annual salary for teachers in the US was $49,720 with a standard deviation of $7200. The distribution is strongly skewed to the right.

  • Consider the question: what is the probability that the mean annual salary of a random sample of 5 US teachers is more than $60,000? What should be considered before calculating this probability? Are there any serious concerns with calculating the probability? If not, calculate the probability.
    • Given that the population distribution is known to be skewed, before calculating the probability we should consider whether the central limit theorem will guarantee that the distribution of sample means will be approximately Normal. However, to provide such a guarantee the central limit theorem requires a sample size larger than 5 (at least of size 30 for a strongly skewed distribution). We cannot calculate the probability using the methods based on a Normal distribution of the sample mean.



Icon qmark.gif
Central Limit Theorem

In 2009 the mean annual salary for teachers in the US was $49,720 with a standard deviation of $7200. The distribution is strongly skewed to the right. What is the probability that the mean annual salary of a random sample of 65 US teachers is less than $48,000? Let's walk through the computations required to calculate this probability.

  • Is it safe to use the Normal distribution to determine this probability?
    • Yes
      • That's correct. According to the central limit theorem, the mean has approximately a Normal distribution when the sample size is large enough, and a sample size of 65 is large enough. It is safe to use the Normal distribution to calculate this probability.
    • No
      • That's not quite right. According to the central limit theorem, the mean has approximately a Normal distribution when the sample size is large enough, and a sample size of 65 is large enough. It is safe to use the Normal distribution to calculate this probability.
  • The mean of the sampling distribution of sample means is
    • $49,720.
      • That's correct. According to the central limit theorem, the mean has approximately a Normal distribution with the same mean as the population; therefore, $49,720 is the mean of the distribution of the sample means.
    • $48,000.
      • That's not quite right. This is the value which delineates the probability to be calculated, rather than the mean of the sample of means. Recall that the central limit theorem says that the mean has approximately a Normal distribution with the same mean as the population; therefore, $49,720 is the mean of the distribution of the sample means.
  • The standard deviation of the sampling distribution of sample means is
    • $893.
      • That's correct. According to the central limit theorem, the mean has approximately a Normal distribution with the same mean as the population and a standard deviation of σ/sqrt(n) = 7200/sqrt(65)= 7200/8.06= 893.
    • $7200.
      • That's not quite right. It looks like you selected the population standard deviation. The standard deviation of the sampling distribution of the mean is calculated using the population standard deviation and the sample size. Try again.
  • Use the information provided to calculate the z-score. z =
    • .24
      • That's not quite right. Recall that the z-score is (value - mean)/stand dev. When calculating the z-score be sure to use the calculated standard deviation for the sampling distribution of the sample mean in the denominator. Try again.
    • -.24
      • That's not quite right. Recall that the z-score is (value - mean)/stand dev. When calculating the z-score be sure 1) to subtract the population mean from the sample value of interest, and 2) to use the calculated standard deviation for the sampling distribution of the sample mean in the denominator. Try again.
    • 1.93
      • That's not quite right. Recall that the z-score is (value - mean)/stand dev. When calculating the z-score be sure to subtract the population mean from the sample value of interest. Try again.
    • -1.93
      • That's correct. The z-score for 48000 is z = (48000 - 49720/(7200/sqrt(65)).
  • Use the z-score and a Normal Distribution calculator to determine the probability that the mean annual salary of a random sample of 65 US teachers is less than $48,000?
    • .0268
      • That's correct. P(X-bar<48,000) = P(Z < -1.93) = .0268. While many teachers likely have an annual salary less than $48000, it would be unlikely for the mean salary of a sample of 65 teachers to be less than $48,000.
    • .4052
      • That's not quite right. You may have found the probability for the z score of -.24. Try again.
    • .5948
      • That's not quite right. You may have found the probability for the z score of .24. Try again.
    • .9732
      • That's not quite right. You may have found the probability for the z score of 1.93, instead of -1.93. Try again.



Icon qmark.gif
Central Limit Theorem

In 2011, scores on the critical reading portion of the SAT (SAT-CR) were approximately Normally distributed with mean μ = 496 and standard deviation σ = 114.

  • Consider the question: What is the probability that the mean SAT-CR score of a random sample of 3 test-takers from 2011 is more than 600?? What should be considered before calculating this probability? Are there any serious concerns with calculating the probability? If not, calculate the probability.
    • Given that the population distribution is approximately Normal, the distribution of sample means will also be approximately Normal, for any sample size. There are no concerns with calculating this probability. The mean of the sampling distribution of sample means is the same as the population mean, μ = 496. The standard deviation is σ/sqrt(n) = 114/sqrt(3) = 65.82. The z-score for 600 is z = (600 - 496)/65.82 = 1.58. P(X-bar > 600) = P(Z > 1.58) = P(Z < -1.58) = .0571. While it is common for individual students to score above 600 on the SAT-CR, it is rather unlikely (less than 6% chance) for the mean score of a sample of 3 students to be above 600.



Sampling distributions for counts and proportions

Icon qmark.gif
The mean and sd of [math]\hat{p}[/math]
  • Out of 300 students in the school, 225 passed an exam. What would be the mean of the sampling distribution of the proportion of students who passed the exam in the school?[3] .75 (to two decimals)
    • That's correct. The mean of the sampling distribution of [math]\hat{p}[/math] is equal to the population proportion. It is 225/300 = .75.
    • That's not quite right. Note that the population in this scenario is the full student body of 300 students. The samples which make up the sampling distribution would be drawn from this population. Consider how the mean of the sampling distribution relates to the mean of the population. Try again.
  • Out of 300 students in the school, 225 passed an exam. You take a sample of 10 of these students. What is the standard deviation of the distribution of sample proportions?[4] .137 (to 3 decimal places)
    • That's correct. The standard deviation of the distribution of sample propotions = [math]\sqrt{ \frac{p(1-p)}{n}} = \sqrt{ \frac{(.75)(.25)}{10}} = .137[/math]
    • That's not quite right. The standard deviation of the distribution of the sample distributions is can be calculated based on the population standard deviation and the sample size. Try again.



Icon qmark.gif
The mean and sd of [math]\hat{p}[/math]

According to the National Student Clearinghouse Research Center, 45 percent of all students who finished a four-year degree in 2010-11 had previously enrolled at a two-year college.[5]

  • We wish to randomly sample students who finished a four-year degree in 2010-2011 to determine the proportion who had previously enrolled at a two-year college. For which of the following sample sizes is the Normal model a good fit for the sampling distributions of the sample proportions?
    • 10
      • That's not quite right. Neither np nor n(1-p) is greater than or equal to 10: np = (10)(0.45) = 4.5 and n(1 - p) = (10)(0.55) = 5.5. A Normal distribution is not a good fit for the sampling distribution of sample proportions. Try again.
    • 20
      • That's not quite right. One of the two conditions is not met. Both np and n(1-p) must be greater than or equal to 10: n(1 - p) = (20)(0.55) = 11 is greater than 10, but np = (20)(0.45) = 9 is less than 10. A Normal distribution is not a good fit for the sampling distribution of sample proportions. Try again.
    • 30
      • That's correct. Both np and n(1-p) are greater than 10: np = (30)(0.45) = 13.5 and n(1 - p) = (30)(0.55) = 16.5. A Normal distribution is a good fit for the sampling distribution of sample proportions.
    • 20 and 30
      • That's not quite right. If the smaller sample size met the conditions for p=.45, then this would be correct. Be sure to check that both of the following are true: np ≥ 10 and n(1 - p) ≥ 10. Try again.



Icon qmark.gif
The mean and sd of [math]\hat{p}[/math]

According to the National Student Clearinghouse Research Center, 45 percent of all students who finished a four-year degree in 2010-11 had previously enrolled at a two-year college.[6]

  • We decide to randomly sample 50 students who finished a four-year degree in 2010-2011 to determine the proportion who had previously enrolled at a two-year college. What is the mean and standard deviation of the sampling distribution of sample proportions?
    • The mean of the sampling distribution is 0.45. The standard deviation is [math]\sqrt{ \frac{p(1-p)}{n}} = \sqrt{ \frac{(.45)(.55)}{50}} = .07[/math]



Icon qmark.gif
Understanding the sampling distribution of [math]\hat{p}[/math]

The test specifications for a math test require that 20% of the test questions relate to geometry.

  • Two tests are assembled by randomly choosing test questions from a pool of over 1000 questions in which geometry questions make up 20%. The first test has 50 questions (long test) and the second test has 10 questions (short test). Which test is more likely to have more than 40% geometry questions?
    • The long test because there are more test questions on this test, so there will be more geometry and questions.
      • That's not quite right. While it is true that a longer test will have more geometry questions, there are overall more test questions on the test. We are interested in the proportion of questions which relate to geometry, not the overall count. Try again.
    • The long test because there is more variability in the proportion of geometry questions among larger samples.
      • That's not quite right. Recall that the larger the sample size the less variable the sampling distribution. Try again.
    • The short test because there is more variability in the proportion of geometry questions among smaller samples.
      • That's correct. When samples are small, there is more variability among the samples, so it is more likely to get sample results further from p=.20 in the short test.
    • As both tests are based on random samples, they have the same chance of having 40% geometry questions.
      • That's not quite right. In random sampling, the variability of the sample results is directly related to the size of the sample. Try again.
  • From a pool of 1000 questions, in which geometry questions make up 20%, the test developers randomly sample 50 questions 5 times to make 5 long tests (50 questions each). (The same question may be included on more than 1 test.) The following sequences show the percent of geometry questions on each of the 5 tests. Which sequence is the most plausible?
    • 22%, 20%, 32%, 18%, 25%
      • That's correct. We can assume the sampling distribution is Normally distributed as both np and n(1-p) are greater than or equal to 10. Using the standard deviation rule, we would expect that about 2/3rds of the p-hats would be within 1 standard deviation of the mean p=.20. The standard deviation is about .06. Of the 5 samples (tests), 4 are within .06 of p=.20.
    • 5%, 73%, 22%, 88%, 56%
      • That's not quite right. If it is safe to assume that the sampling distribution is Normally distributed, then we can use the standard deviation rule to establish the bounds for the most likely sample results: we would expect about 2/3rds of the sample proportions to be within 1 standard deviation of the mean. In these results, 3 of the samples are over 3 standard deviations from the mean. Try again.
    • 20%, 20%, 20%, 20%, 20%
      • That's not quite right. When randomly sampling from a population with mean proportion p=.20, we expect that many of the samples drawn will be the same or close to the population proportion, however it is rather unlikely that all 5 of the tests would have exactly 20% geometry questions. It is more likely that the percent of geometry questions in the 5 tests is mostly close to 20%. Try again.
    • It is not safe to use the standard deviation rule to evaluate these results.
      • That's not quite right. In fact, we can assume the sampling distribution is Normally distributed as both np and n(1-p) are greater than or equal to 10: np = (50)(.20) = 10 and p(1-p) = (50)(.8) = 40. Using the standard deviation rule, we would expect that about 2/3rds of the p-hats would be within 1 standard deviation of the mean p=.20. Try again.



Icon qmark.gif
Using Normal distribution calculations with the sampling distribution of [math]\hat{p}[/math]

The National Institute of Mental Health reports that approximately 10 percent of American adults suffer from depression or a depressive illness.[7] A random sample of 210 American adults is obtained.

  • What can we assume about the sampling distribution of the sample proportion, [math]\hat{p}[/math]?
    • Given that both np and n(1-p) are greater than or equal to 10, np = (210)(.1) = 21 and n(1-p) = (210)(.9) = 189, we can assume that the sampling distribution is a Normal distribution with mean μ[math]\hat{p}[/math]=.10 and σ[math]\hat{p}[/math] = sqrt(p(1-p)/n) = sqrt(.1(1-.1)/210) = .02.
  • If the sampling distribution has a Normal distribution, we can use the standard deviation rule to better understand the distribution. What interval is almost certain (probability .997) to contain the sample proportion of adults who suffer from depression or depressive illness?
    • As we can assume that the sampling distribution is Normally distributed, the standard deviation rule says that 99.7% of observations fall within 3 standard deviations above and below the mean. For a sampling distribution with mean μ[math]\hat{p}[/math]=.1 and σ[math]\hat{p}[/math] = .02: .1 + 3*.02 = .16 and .1 - 3*.02 = .04. There is roughly a 99.7% chance that the sample proportion falls in the interval (.04, .16).
  • (This question needs reworking) For what percent of samples of 210 adults from the population, would we expect to find 35 or more adults with depression or a depressive illness? Answer: .15% (to two decimal places)
    • That's correct. In the last question we established that in the sampling distribution 99.7% of samples will have a sample proportion between .04 and .16. As 35 out of the 210 adults in the sample is .16, we conclude that the area above the proportion in this sample is the upper tail beyond 3 standard deviations above the mean. According to the standard deviation rule, sample proportions greater than 0.16 will occur 0.15% of the time: (100% - 99.7%) / 2 = 0.15%.
    • That's correct. In the last question we established that in the sampling distribution 99.7% of samples will have a sample proportion between .04 and .16. As 35 out of the 210 adults in the sample is .17, we conclude that the area above the proportion in this sample is in the upper tail beyond 3 standard deviations above the mean. In particular it is the area above z=(0.167-0.10)/ 0.02 = 3.35. The area under the Normal curve above 3.35 is .04%
    • That's not quite right. In the last question we established that in the sampling distribution 99.7% of samples will have a sample proportion between .04 and .16. How does the sample proportion in this sample compare with this range? Use the standard deviation rule to determine the area under the curve which corresponds to the area in the problem. Try again.
    • That's not quite right. In the last question we established that in the sampling distribution 99.7% of samples will have a sample proportion between .04 and .16. How does the sample proportion in this sample compare with this range? Calculate the z-score and determine the area under the curve (in percents). Try again.
  • What is the probability that at least 25 in 210 (proportion .12) adults suffer from depression or a depressive illness: P(p-hat > .12)? (Note that p-hat = .12 is 1 standard deviation (.02) above the mean (.1), which means you can use the standard deviation rule to estimate this probability.) Answer: .16 (to two decimal places)
    • That's correct. The standard deviation rule tells us that there is a 68% chance that the sample proportion will be within 1 standard deviation of the mean: between .08 and .12. P(p-hat > .12) = (1 -.68)/2 = .16.
    • That's not quite right. The standard deviation rule tells us that there is a 68% chance that the sample proportion will be within 1 standard deviation of the mean: between .08 and .12. The probability that we are interested in is represented by the area under the curve above .12. Try again.



Notes

  1. Question adapted from Ebook Problem Set - Normal Std, Problem 20 in Probability and Statistics EBook, from UCLA Statistics Online Computational Resource (SOCR), Retrieved 12 November 2012.
  2. Adapted from Central Limit Theorem Demonstration at Online Statistics Education: An Interactive Multimedia Course of Study. Project Leader: David M. Lane, Rice University. Retrieved 25 November 2012.
  3. Adapted from Sampling distribution of p at Online Statistics Education: An Interactive Multimedia Course of Study. Project Leader: David M. Lane, Rice University. Retrieved 1 December 2012.
  4. Adapted from Sampling distribution of p at Online Statistics Education: An Interactive Multimedia Course of Study. Project Leader: David M. Lane, Rice University. Retrieved 1 December 2012.
  5. http://www.studentclearinghouse.info/snapshot/docs/SnapshotReport6-TwoYearContributions.pdf
  6. http://www.studentclearinghouse.info/snapshot/docs/SnapshotReport6-TwoYearContributions.pdf
  7. Obtained from Susan Dean and Barbara Illowsky, Hypothesis Testing of Single Mean and Single Proportion: Practice 3 in Collaborative Statistics at Connexions. Retrieved 3 December 2012.