# Sampling

Description of activity and Excel instructions: Sampling

Dataset: student_survey.xls

## Contents

## Create a random sample

The first step is to assign a random number to each student in the dataset.

- Label the five columns to the right of the data "RandomNumber", "HandedRandom", "SexRandom", "VerbalRandom", and "AgeRandom".
- In the
*RandomNumber*column, type this formula in row 2: "=RAND()". - Use the cell's fill handle to copy the formula to all the cells in the column from row 3 down to row 1130.

A random number, between 0 and 1, displays in each cell of the column. In order to use the value, we need to convert the cell contents from a formula to a value.

- Click on the column heading to select the entire column of random numbers that you just created.
- Right-click on the column header and choose
**Copy**. - Right-click on the column header again and choose
**Paste Special...**.

The *Paste Special* dialog displays.

- Under
*Selection*check the box for**Numbers**. - Click
**OK**.

Our sample will be the 192 students with the lowest random numbers.

- * Type the key combination
**Ctrl+A**to select all of the data in the worksheet. - Select
**Data**>**Sort...**.

The Sort dialog displays.

- Click the
**Options**tab. - Check
**Range contains column labels**. - Click the
**Sort Criteria**tab. - Under
*Sort by*, choose**RandomNumber**in the drop down menu. - Click
**OK**.

The data is now sorted from lowest to highest random number.

- Select all of the data in rows 2-193 under the columns that contain the original variables (Handed, Sex, Verbal, Age), and copy and paste those data values to the new columns you created (HandedRandom, SexRandom, VerbalRandom, and AgeRandom).

You now have a random sample of 192 students from a population of 1129 students.

## Verify that the random sample is comparable to the population

Although this confirmation process can be tedious, let's check that each of the variables in the random sample is similar to the corresponding variable in the population.

### Handedness

To compare the proportion of right-handed students in the sample to those in the population, create two pie-charts, one for handedness in the population and one for handedness in the random sample.

We need to start by creating the counts in each category of Handed:

- Select the column labeled
*Handed*. - Click
**Data**>**DataPilot**>**Start...**.

The *Select Source* dialog displays and Calc automatically highlights column A.

- Click
**Current selection**. - Click
**OK**.

The DataPilot dialog displays.

- Drag and drop the
**Handed**variable into the**Column Fields**window. - Drag and drop the
**Handed**variable into the**Data Fields**window. - Set the
**Handed**variable in the**Data Fields**window to display**Count**. - Expand the dialog and specify where to put the table.
- Click
**OK**to exit the*DataPilot*dialog.

Now, we will use the counts to create a pie chart:

- Select
**Insert**>**Chart...**.

The *Chart Wizard* displays.

- In
*1. Chart Type*, select a normal pie chart - Click
**Next>>**. - In
*2. Data Range*:- Include the category labels and the counts in the
*Data Range*window. - Click
**Data series in rows**. - Check
**First row as label**. - Uncheck
**First column as label**. - Click
**Next>>**.

- Include the category labels and the counts in the

No changes are needed in *3. Data Series*, but check that the *Categories* window correctly points to the category labels in the table.

- Click
**Next>>**. - In
*4. Chart Elements*, en- Enter a title for the chart.
- Note that the
*Display legend*box is checked.

- Click
**Finish**.

To see percentages as well as values for each slice,

- In edit mode, right-click on one of the pie pieces.
- Select
**Object Properties...**.

The *Data Series* dialog displays.

- Select the
*Data Labels*tab. - Check
**Show value as number**. - Check
**Show value as percentage**. - In the
*Separator*drop down menu, choose**Comma**. - Click
**OK**.

Your pie chart is complete. Move it under the data table for easy viewing.

Now repeat this process to create a table of counts and pie-chart for the variable "HandedRandom".

Consider the distributions to be comparable if the sample proportion comes within **about 5%** of the population proportion.

### Sex

To compare the proportion of female students in the sample to the proportion in the population, create two pie-charts, one for sex in the population and one for sex in the random sample.

Follow the instructions for Handedness, changing the variables to Sex and SexRandom.

Consider the distributions to be comparable if the sample proportion comes within **about 5%** of the population proportion.

### SAT Verbal

Create two 5-number summary tables - one for Verbal SAT score in the population and one for Verbal SAT score in the sample.

Starting with the "Verbal" variable first, create a table with the following statistics.

- Label the table "Verbal".
- Next to a label of "Min", type =min(D2:D1130).
- Next to a label of "Q1", type =quartile(D2:D1130,1). (Note that for some operating systems, the comma between the range and parameter needs to be a semi-colon.)
- Next to a label of "Median", type =median(D2:D1130).
- Next to a label of "Mean", type =average(D2:D1130).
- Next to a label of "Q3", type =quartile(D2:D1130,3). (Note that for some operating systems, the comma between the range and parameter needs to be a semi-colon.)
- Next to a label of "Max", type =max(D2:D1130).

Create a second 5-number summary table for VerbalRandom using the same instructions, but change the data range as appropriate.

Since SAT scores tend to follow a normal (symmetric) distribution, you can focus on **means** to make a comparison. Consider the distributions to be comparable if the sample mean SAT comes within **about 10 points** of the population mean.

### Age

Create two more 5-number summary tables - one for "Age" in the population and one for "AgeRandom" in the sample.

Use the instructions above for "SAT Verbal" but with the variables "Age" and "AgeRandom".

Since Age tends to follow a right-skewed distribution, you should focus on **medians** to make a comparison. Consider the distributions to be comparable if the sample median age comes within **about .5 years** of the population median.

Return to the OLI's Sampling page to complete the first two interpretive exercises. Be sure to compare the sample with the population for all 4 of the variables, using the guidelines for comparison provided above (and on the OLI page).

## Determine whether the non-random sample of students is representative of the population of students

In order to address this issue, we will need to extract this group from the population.

We'll first sort the original data using the variable "Course", and then copy the data for "Business" into four new columns.

Sorting the data in the worksheet containing the random sample, could mess things up. The best first step is the copy the data columns for the original data, columns A-E, to a new worksheet.

### Preparing the data

To find all of the business students, sort the data by "Course".

- Type the key combination
**Ctrl+A**to select all of the data in the worksheet. - Select
**Data**>**Sort...**.

The Sort dialog displays.

- Click the
**Options**tab. - Check
**Range contains column labels**. - Click the
**Sort Criteria**tab. - Under
*Sort by*, choose**Course**in the drop down menu. - Click
**OK**.

You now have the business students organized in a block.

- In the columns to the right of the random variables (or on a separate sheet), type "Handed_Business", "Sex_Business", "Verbal_Business", and "Age_Business".
- Copy the data for the four original variables (Handed, Sex, Verbal, and Age) from the rows where the course is Business and paste them in the new columns.

### Comparing the sample to the population

Next, we explore whether the four variables' behavior for the (non-random) sample of business statistics students is comparable to their behavior for the population:

### Handedness

To compare the proportion of right-handed students in the sample to those in the population, create two pie-charts, one for handedness in the population and one for handedness in the sample of business statistics students.

- Copy the table of counts and the pie chart for the population from the original worksheet and paste in this new worksheet.
- Use the above instructions to create the table of counts and the pie chart of "Handedness" for the business students.

Consider the distributions to be comparable if the sample proportion comes within **about 5%** of the population proportion.

### Sex

To compare the proportion of female students in the sample to the proportion in the population, create two pie-charts, one for sex in the population and one for sex in the sample of business statistics students.

- Copy the table of counts and the pie chart for the population from the original worksheet and paste in this new worksheet.
- Use the above instructions to create the table of counts and the pie chart of "Sex" for the business students.

**about 5%** of the population proportion.

### SAT Verbal

Create two tables of descriptive statistics, one for SAT Verbal score in the population and one for SAT Verbal score in the sample of business statistics students.

- Copy the table of descriptive statistics for the population from the original worksheet and paste in this new worksheet.
- Use the above instructions to create the table of descriptive statistics for "SAT Verbal" for the business students.

Since SAT scores tend to follow a normal (symmetric) distribution, you can focus on **means** to make a comparison. Consider the distributions to be comparable if sample mean SAT comes within **about 10 points** of population mean.

### Age

Create two tables of descriptive statistics, one for age in the population and one for age in the sample of business statistics students.

- Copy the table of descriptive statistics for the population from the original worksheet and paste in this new worksheet.
- Use the above instructions to create the table of descriptive statistics for "Age" for the business students.

Since Age tends to follow a right-skewed distribution, you should focus on **medians** to make a comparison. Consider the distributions to be comparable if sample median age comes within **about .5 years** of population median.

Return to the OLI's Sampling page to complete the final interpretive exercise. Be sure to compare the business students with the population for all 4 of the variables, using the guidelines for comparison provided above (and on the OLI page).