StatTutor: Class effect (Question 3)
Description of activity and Excel instructions: StatTutor Class effect (Question 3)
Dataset: Direct link for gradebook.xls not available.
This exercise begins by taking a look at the data. To retain the correct analysis calculations for Questions 1 & 2, copy and paste the raw data to a new sheet in your spreadsheet file. Label the sheet "Q3".
Calculate descriptive statistics for each value of the categorical variable
Before beginning on the table of descriptive statistics, let's sort the data to make it easier to check the tabulation.
- Type Ctrl+A to select all of the data.
- Select Data > Sort....
The Sort dialog displays.
- Click the Options tab.
- Check Range contains column labels.
- Cleck the Sort Criteria tab
- Under Sort by, in the drop down menu, select the relevant categorical variable.
- Click OK.
Now that the data are sorted, pick an empty area in your worksheet and create a table where the row headers are "Min", "Q1", "Median", "Q3", "Max", "n", "Mean", "St Dev" and the column headers are the different values of the categorical variable.
For each cell in the table, enter the formula to calculate the statistic listed in the row header. The data range to include in each formula is the range of data in the quantitative variable that corresponds to the categorical variable value listed in the column header.
The formulas you will need are:
- Min: "=min([range])"
- Q1: "=quartile([range],1)" [Note that in some operating systems the formula uses a semi-colon instead of a colon between the range and the numeral "1".]
- Median: "=median([range])"
- Q3: "=quartile([range],3)" [Note that in some operating systems the formula uses a semi-colon instead of a colon, between the range and the number "3".]
- Max: "=max([range])"
- n: "=count([range])"
- Mean: "=average([range])"
- St Dev: "=stdev([range])"
Create side-by-side boxplots
Unfortunately, Calc can't create boxplots directly, so we have to create a column graph and alter it to display as side-by-side boxplots. We start by using the calculated Descriptive Statistics to generate new values that will be the basis for our column graph.
Create the necessary statistics
- In each successive cell under St Dev, type "25th Percentile", "50th Percentile", "75th Percentile", "Lower limit", and finally "Upper limit".
The values in these new rows, for each value of the categorical variable, are calculated based on the 5-number summary values. Enter the necessary formulas, for each eye color, to obtain:
- 25th Percentile = Q1 (Remember to copy the numbers from Q1 without copying the formulas.)
- 50th Percentile = Median - Q1
- 75th Percentile = Q3 - Median
- Lower limit = Q1 - min
- Upper limit values = Max - Q3
Create the boxplots
Now you are ready to create the boxplots.
- Highlight the values in the 25th Percentile, 50th Percentile, and 75th Percentile rows
- Choose Insert > Chart...
The Chart Wizard displays.
- In 1. Chart Type', choose the stacked column option.
- In 2. Data Range, ensure that the data is correctly specified. (Select the 25th, 50th, and 75th values for the values of the categories. The data series is in rows, and has no headers.)
- In 3. Data Series, use the Categories window to specify the category labels.
- In 4. Chart Elements, label and select appropriate options for your chart.
A stacked column graph displays. To create the boxplots, add error bars (to the min and max values) and adjust the color and outlines as needed.
- Use the Insert Y Error Bars... option to add the Upper Limit error bar.
- Use the Insert Y Error Bars... option to add the Lower Limit error bar. (Remember that to add this error bar you have to select the "lowest portion of the bar.")
- For each section of the bars, adjust the color and border as necessary.
- If needed, adjust the y-axis scale.
Return to the StatTutor exercise for Question 3 to provide a comparison of the data in the two groups: describe the key features of the data display and support your description with numerical measures. (Be sure to include the numerical results in your description.)
Calc does not include a data analysis package or formulas that can be used for calculating the F-statistic. You can use Excel for this activity, or the web-based statistics calculator for a one-way ANOVA at danielsoper.com. The statistics calculator uses the number, mean and standard deviation for each group to calculate the F-statistic and corresponding p-value. Use 3 decimal places when entering the mean and sd.
Return to the StatTutor exercise for Question 3 to report results and draw conclusions.