User:ASnieckus/StatisticsContent/Learn by doing/Sampling

Description of activity and Excel instructions: Sampling

Dataset: student_survey.xls

Create a random sample
The first step is to assign a random number to each student in the dataset.
 * Label the five columns to the right of the data "RandomNumber", "HandedRandom", "SexRandom", "VerbalRandom", and "AgeRandom".
 * In the RandomNumber column, type this formula in row 2: "=RAND".
 * Use the cell's fill handle to copy the formula to all the cells in the column from row 3 down to row 1130.

A random number, between 0 and 1, displays in each cell of the column. In order to use the value, we need to convert the cell contents from a formula to a value.


 * Click on the column heading to select the entire column of random numbers that you just created.
 * Right-click on the column header and choose Copy.
 * Right-click on the column header again and choose Paste Special....

The Paste Special dialog displays.
 * Under Selection check the box for Numbers.
 * Click OK.

Our sample will be the 192 students with the lowest random numbers.
 * * Type the key combination Ctrl+A to select all of the data in the worksheet.
 * Select Data > Sort....

The Sort dialog displays.
 * Click the Options tab.
 * Check Range contains column labels.
 * Click the Sort Criteria tab.
 * Under Sort by, choose RandomNumber in the drop down menu.
 * Click OK.

The data is now sorted from lowest to highest random number.
 * Select all of the data in rows 2-193 under the columns that contain the original variables (Handed, Sex, Verbal, Age), and copy and paste those data values to the new columns you created (HandedRandom, SexRandom, VerbalRandom, and AgeRandom).

You now have a random sample of 192 students from a population of 1129 students.

Verify that the random sample is comparable to the population
Although this confirmation process can be tedious, let's check that each of the variables in the random sample is similar to the corresponding variable in the population.

Handedness
To compare the proportion of right-handed students in the sample to those in the population, create two pie-charts, one for handedness in the population and one for handedness in the random sample.

We need to start by creating the counts in each category of Handed: The Select Source dialog displays and Calc automatically highlights column A. The DataPilot dialog displays.
 * Select the column labeled Handed.
 * Click Data > DataPilot > Start....
 * Click Current selection.
 * Click OK.


 * Drag and drop the Handed variable into the Column Fields window.
 * Drag and drop the Handed variable into the Data Fields window.
 * Set the Handed variable in the Data Fields window to display Count.
 * Expand the dialog and specify where to put the table.
 * Click OK to exit the DataPilot dialog.

Now, we will use the counts to create a pie chart:
 * Select Insert > Chart....

The Chart Wizard displays.
 * In 1. Chart Type, select a normal pie chart
 * Click Next>>.
 * In 2. Data Range:
 * Include the category labels and the counts in the Data Range window.
 * Click Data series in rows.
 * Check First row as label.
 * Uncheck First column as label.
 * Click Next>>.

No changes are needed in ''3. Data Series, but check that the Categories'' window correctly points to the category labels in the table.
 * Click Next>>.
 * In 4. Chart Elements, en
 * Enter a title for the chart.
 * Note that the Display legend box is checked.
 * Click Finish.

To see percentages as well as values for each slice,
 * In edit mode, right-click on one of the pie pieces.
 * Select Object Properties....

The Data Series dialog displays.
 * Select the Data Labels tab.
 * Check Show value as number.
 * Check Show value as percentage.
 * In the Separator drop down menu, choose Comma.
 * Click OK.

Your pie chart is complete. Move it under the data table for easy viewing.

Now repeat this process to create a table of counts and pie-chart for the variable "HandedRandom".

Consider the distributions to be comparable if the sample proportion comes within about 5% of the population proportion.

Sex
To compare the proportion of female students in the sample to the proportion in the population, create two pie-charts, one for sex in the population and one for sex in the random sample.

Follow the instructions for Handedness, changing the variables to Sex and SexRandom.

Consider the distributions to be comparable if the sample proportion comes within about 5% of the population proportion.

SAT Verbal
Create two 5-number summary tables - one for Verbal SAT score in the population and one for Verbal SAT score in the sample.

Starting with the "Verbal" variable first, create a table with the following statistics.
 * Label the table "Verbal".
 * Next to a label of "Min", type =min(D2:D1130).
 * Next to a label of "Q1", type =quartile(D2:D1130,1). (Note that for some operating systems, the comma between the range and parameter needs to be a semi-colon.)
 * Next to a label of "Median", type =median(D2:D1130).
 * Next to a label of "Mean", type =average(D2:D1130).
 * Next to a label of "Q3", type =quartile(D2:D1130,3). (Note that for some operating systems, the comma between the range and parameter needs to be a semi-colon.)
 * Next to a label of "Max", type =max(D2:D1130).

Create a second 5-number summary table for VerbalRandom using the same instructions, but change the data range as appropriate.

Since SAT scores tend to follow a normal (symmetric) distribution, you can focus on means to make a comparison. Consider the distributions to be comparable if the sample mean SAT comes within about 10 points of the population mean.

Age
Create two more 5-number summary tables - one for "Age" in the population and one for "AgeRandom" in the sample.

Use the instructions above for "SAT Verbal" but with the variables "Age" and "AgeRandom".

Since Age tends to follow a right-skewed distribution, you should focus on medians to make a comparison. Consider the distributions to be comparable if the sample median age comes within about .5 years of the population median.

Return to the OLI's Sampling page to complete the first two interpretive exercises. Be sure to compare the sample with the population for all 4 of the variables, using the guidelines for comparison provided above (and on the OLI page).

Determine whether the non-random sample of students is representative of the population of students
In order to address this issue, we will need to extract this group from the population.

We'll first sort the original data using the variable "Course", and then copy the data for "Business" into four new columns.

Sorting the data in the worksheet containing the random sample, could mess things up. The best first step is the copy the data columns for the original data, columns A-E, to a new worksheet.

Preparing the data
To find all of the business students, sort the data by "Course".
 * Type the key combination Ctrl+A to select all of the data in the worksheet.
 * Select Data > Sort....

The Sort dialog displays.
 * Click the Options tab.
 * Check Range contains column labels.
 * Click the Sort Criteria tab.
 * Under Sort by, choose Course in the drop down menu.
 * Click OK.

You now have the business students organized in a block.
 * In the columns to the right of the random variables (or on a separate sheet), type "Handed_Business", "Sex_Business", "Verbal_Business", and "Age_Business".
 * Copy the data for the four original variables (Handed, Sex, Verbal, and Age) from the rows where the course is Business and paste them in the new columns.

Comparing the sample to the population
Next, we explore whether the four variables' behavior for the (non-random) sample of business statistics students is comparable to their behavior for the population:

Handedness
To compare the proportion of right-handed students in the sample to those in the population, create two pie-charts, one for handedness in the population and one for handedness in the sample of business statistics students.
 * Copy the table of counts and the pie chart for the population from the original worksheet and paste in this new worksheet.
 * Use the above instructions to create the table of counts and the pie chart of "Handedness" for the business students.

Consider the distributions to be comparable if the sample proportion comes within about 5% of the population proportion.

Sex
To compare the proportion of female students in the sample to the proportion in the population, create two pie-charts, one for sex in the population and one for sex in the sample of business statistics students.
 * Copy the table of counts and the pie chart for the population from the original worksheet and paste in this new worksheet.
 * Use the above instructions to create the table of counts and the pie chart of "Sex" for the business students.

Consider the distributions to be comparable if the sample proportion comes within about 5% of the population proportion.

SAT Verbal
Create two tables of descriptive statistics, one for SAT Verbal score in the population and one for SAT Verbal score in the sample of business statistics students.
 * Copy the table of descriptive statistics for the population from the original worksheet and paste in this new worksheet.
 * Use the above instructions to create the table of descriptive statistics for "SAT Verbal" for the business students.

Since SAT scores tend to follow a normal (symmetric) distribution, you can focus on means to make a comparison. Consider the distributions to be comparable if sample mean SAT comes within about 10 points of population mean.

Age
Create two tables of descriptive statistics, one for age in the population and one for age in the sample of business statistics students.
 * Copy the table of descriptive statistics for the population from the original worksheet and paste in this new worksheet.
 * Use the above instructions to create the table of descriptive statistics for "Age" for the business students.

Since Age tends to follow a right-skewed distribution, you should focus on medians to make a comparison. Consider the distributions to be comparable if sample median age comes within about .5 years of population median.

Return to the OLI's Sampling page to complete the final interpretive exercise. Be sure to compare the business students with the population for all 4 of the variables, using the guidelines for comparison provided above (and on the OLI page).