User:ASnieckus/StatisticsContent/Learn by doing/Causation and experiments

Description of activity and Excel instructions: Causation and experiments

Dataset: computers.xls

Create a random sample
To create a random sample of 450 individuals, assign a random number to each subject in the dataset.
 * Label the four columns to the right of the data "RandomNumber", "Age_SRS", "Gender_SRS", and "Comp_SRS".
 * In the RandomNumber column, type this formula in row 2: "=RAND".
 * Use the cell's fill handle to copy the formula to all the cells in the column from row 3 down to row 20784.

A random number, between 0 and 1, displays in each cell of the column. In order to use the value, we need to convert the cell contents from a formula to a value.


 * Click on the column heading to select the entire column of random numbers that you just created.
 * Right-click on the column header and choose Copy.
 * Right-click on the column header again and choose Paste Special....

The Paste Special dialog displays.
 * Under Selection check the box for Numbers.
 * Click OK.

Our sample will be the 450 subjects with the lowest random numbers.
 * Type the key combination Ctrl+A to select all of the data in the worksheet.
 * Select Data > Sort....

The Sort dialog displays.
 * Click the Options tab.
 * Check Range contains column labels.
 * Click the Sort Criteria tab.
 * Under Sort by, choose RandomNumber in the drop down menu.
 * Click OK.

The data is now sorted from lowest to highest random number.
 * Select all of the data in rows 2-451 under the columns that contain the original variables (Age, Gender, Comp), and copy and paste those data values to the new columns you created (Age_SRS, Gender_SRS, and Comp_SRS).

You now have a simple random sample of 450 individuals from a population of 20784.

Assign subjects to treatment groups
Now we will randomly assign our SRS (simple random sample) of 450 subjects to treatment groups, one for each of the three versions of the ISP's software. Let us denote the versions "1," "2," and "3," and create a categorical variable to identify the treatment for each subject.


 * Label the next blank column to the right of your data "Treatment".
 * In the Treatment column, type this formula in row 2: =RANDBETWEEN(1,3).
 * Use the cell's fill handle to copy the formula to all of the cells in the column, from row 3 down to row 451.

This generates a random integer from 1 to 3 in each cell of the column.
 * Convert the cell contents for the Treatment column from Formula to Numbers (as you did for RandomNumber).

Use sort to group the data by treatment number.
 * Type the key combination Ctrl+A to select all of the data in the worksheet.
 * Select Data > Sort....

The Sort dialog displays.
 * Click the Options tab.
 * Check Range contains column labels.
 * Click the Sort Criteria tab.
 * Under Sort by, choose Treatment in the drop down menu.
 * Click OK.

The data is now sorted from lowest to highest Treatment number.

We are finally reaching the goal of this activity.

Verify that the random sample is comparable to the population
Let us now examine whether the randomization was successful in making our three treatment groups similar with respect to the variables age, gender, and comp. In other words, we will now examine whether the distributions of these variables in the three groups are similar or not.

Age
To compare the distribution of age among the three treatment groups, we'll create side-by-side boxplots of age by treatment.

Create descriptive statistics
The first step is to create a table of descriptive statistics, the 5-number summary.

To create this table, first make the row and column labels:
 * In an empty section of your worksheet, pick a cell and type "Statistic".
 * Move one cell to the right and type "Treatment 1".
 * Move one cell to the right, again and type "Treatment 2"
 * And another cell to the right and type "Treatment 3".
 * In the cells underneath Statistic, type "Min", "Q1", "Median", "Q3", and finally "Max".

Now for each statistic, enter the formula to calculate that statistic using the range of data that corresponds to each treatment group:

Starting with Treatment 1 cells, enter the formulas for the following statistics, where [range] is the range of data for Age_SRS for Treatment 1.
 * Min: type =min([range]).
 * Q1: type =quartile([range],1). (Note that for some operating systems, the comma between the range and parameter needs to be a semi-colon.)
 * Median: type =median([range]).
 * Q3: type =quartile([range],3). (Note that for some operating systems, the comma between the range and parameter needs to be a semi-colon.)
 * Max: type =max([range]).

Repeat for Treatments 2 and 3, adjusting the data range appropriately.

Create side-by-side boxplots
First you need to create the differences from which the stacked bars are created.
 * In each successive cell under Max, type "25th Percentile", "50th Percentile", "75th Percentile", "Lower limit", and finally "Upper limit".
 * The values in these new rows, for each treatment group, are calculated based on the 5-number summary values. Enter the necessary formulas to obtain:
 * 25th Percentile = Q1 (Remember to copy the numbers from Q1 without copying the formulas.)
 * 50th Percentile = Median - Q1
 * 75th Percentile = Q3 - Median
 * Lower limit = Q1 - min
 * Upper limit values = Max - Q3

Now you are ready to create the boxplots.
 * Highlight the values in the 25th Percentile, 50th Percentile, and 75th Percentile rows
 * Choose Insert > Chart...

The Chart Wizard displays.
 * In 1. Chart Type', choose the stacked column option.
 * In 2. Data Range, ensure that the data is correctly specified.
 * In 3. Data Series, use the Categories window to specify the category labels.
 * In 4. Chart Elements, label and select appropriate options for your chart.

A stacked column graph displays. To create the boxplots, add error bars (to the min and max values) and adjust the color and outlines as needed.
 * Use the Insert Y Error Bars... option to add the Upper Limit error bar.
 * Use the Insert Y Error Bars... option to add the Lower Limit error bar. (Remember to add this error bar select the "lowest portion of the bar.")
 * For each section of the bars, adjust the color and border as necessary.
 * If needed, adjust the y-axis scale.

Since we will continue working with the data in the worksheet, it is a good idea to save a copy of your boxplot graph so that when you re-sort the data in the worksheet it does not disappear:


 * Right-click on the graph and choose Copy.
 * Right-click on an empty cell in the worksheet and choose Paste Special....

The Paste Special dialog displays.
 * Select the XML source.
 * Click OK.

This saves your graph as an image so that it will not be disturbed by re-sorting the data. You can now delete your original graph.

Gender
To compare the distribution of gender among the three treatment groups, we'll look at a two-way table of conditional percents:

The Select Source dialog displays.
 * Highlight the block of columns containing the variables Gender_SRS and Treatment.
 * Click Data > DataPilot > Start....
 * Click Current selection.
 * Click OK.

The DataPilot dialog displays. You now have a two-way table of counts.
 * Drag and drop Treatment into the Row Fields window.
 * Drag and drop Gender into the Column Fields window.
 * Drag and drop Treatment into the Data Fields window.
 * Revise the Data Fields variable so it will display Counts.
 * Expand the dialog box and specify where to put the two-way table.
 * Check the box next to Ignore empty rows, so the table doesn't count missing data.
 * Click OK to exit the DataPilot dialog.

Create a second table in which the entries are percent of row, the percent of females and males in each treatment group. (Remember that to save a second version, select a location for the revised table on the same sheet).

Now you have the data to compare the three treatment groups.

Comp (hours per week of computer use)
To compare the distribution of Comp among the three treatment groups, create side by side boxplots of comp for each treatment. Follow the instructions above for Age, with the obvious changes.

Return to the OLI's Causation and experiments page to complete the interpretive exercises. Be sure to discuss your analysis of all three variables.