StatTutor: Checking randomization

From WikiEducator
Jump to: navigation, search


Description of activity and Excel instructions: StatTutor -- Consider study design: Checking randomization

Dataset: Direct link for depression.xls not available.

Comparing counts in each treatment group

Using the COUNTIf function, you can count the values for each of the three treatment groups, without sorting the data.

  • In an empty area of the worksheet, create 3 column headers: Lithium (ttt 0), Imipramine (ttt 1), Placebo (ttt 2)
  • Enter the formula "=COUNTIF([range],0)" in the cell directly under Lithium, where [range] is the full range of data for the treatment variable, and "0" is the value for the lithium treatment group.
  • Repeat, revising the treatment group value, for each of the other groups.

Comparing distributions of AcuteT for the different treatment groups

AcuteT is a quantitative variable (measured in number of days). We will compare 5-number summaries and boxplots for each of the three treatment groups.

Sort the data by treatment group

Use sort to group the data by treatment number.

  • Type the key combination Ctrl+A to select all of the data in the worksheet.
  • Select Data > Sort....

The Sort dialog displays.

  • Click the Options tab.
  • Check Range contains column labels.
  • Click the Sort Criteria tab.
  • Under Sort by, choose Treat in the drop down menu.
  • Click OK.

The data is now sorted from lowest to highest Treatment number.

Create descriptive statistics

The first step is to create a table of descriptive statistics, the 5-number summary.

To create this table, first make the row and column labels:

  • In an empty section of your worksheet, pick a cell and type "Statistic".
  • Move one cell to the right and type "Lithium (ttt 0)".
  • Move one cell to the right, again and type "Imipramine (ttt 1)"
  • And another cell to the right and type "Placebo (ttt 2)".
  • In the cells underneath Statistic, type "Min", "Q1", "Median", "Q3", and finally "Max".

Now for each statistic, enter the formula to calculate that statistic using the range of data that corresponds to each treatment group:

Starting with the Lithium cells, enter the formulas for the following statistics, where [range] is the range of data for AcuteT for Treat=0.

  • Min: type =min([range]).
  • Q1: type =quartile([range],1). (Note that for some operating systems, the comma between the range and parameter needs to be a semi-colon.)
  • Median: type =median([range]).
  • Q3: type =quartile([range],3). (Note that for some operating systems, the comma between the range and parameter needs to be a semi-colon.)
  • Max: type =max([range]).

Repeat for Treatments 1 and 2, adjusting the data range appropriately.

Create side-by-side boxplots

First you need to create the differences from which the stacked bars are created.

  • In each successive cell under Max, type "25th Percentile", "50th Percentile", "75th Percentile", "Lower limit", and finally "Upper limit".
  • The values in these new rows, for each treatment group, are calculated based on the 5-number summary values. Enter the necessary formulas to obtain:
    • 25th Percentile = Q1 (Remember to copy the numbers from Q1 without copying the formulas.)
    • 50th Percentile = Median - Q1
    • 75th Percentile = Q3 - Median
    • Lower limit = Q1 - min
    • Upper limit values = Max - Q3

Now you are ready to create the boxplots.

  • Highlight the values in the 25th Percentile, 50th Percentile, and 75th Percentile rows
  • Choose Insert > Chart...

The Chart Wizard displays.

  • In 1. Chart Type', choose the stacked column option.
  • In 2. Data Range, ensure that the data is correctly specified.
  • In 3. Data Series, use the Categories window to specify the category labels.
  • In 4. Chart Elements, label and select appropriate options for your chart.

A stacked column graph displays. To create the boxplots, add error bars (to the min and max values) and adjust the color and outlines as needed.

  • Use the Insert Y Error Bars... option to add the Upper Limit error bar.
  • Use the Insert Y Error Bars... option to add the Lower Limit error bar. (Remember, to add this error bar select the "lowest portion of the bar.")
  • For each section of the bars, adjust the color and border as necessary.
  • If needed, adjust the y-axis scale.

Return to the Consider Study Design StatTutor page to complete the interpretive exercises. (Be sure to include the data in your description of the results.)