User:ASnieckus/StatisticsContent/Learn by doing/Boxplot2

Description of activity and Excel instructions: Boxplot, 2 of 2

Dataset: graduation.xls

Calculate descriptive statistics by college
To start, we'll create a table that will list each statistical measure and the value for each college. The table that will result is found on the OLI Statistics Learn by Doing page. Begin by making the row and column labels:
 * Select a cell to be the upper left corner of the table; choose a cell in an empty section of your worksheet (to the right of the data columns)
 * Type "Statistic".
 * Moving one cell to the right, type "College A".
 * Moving one cell to the right, type "College B".
 * Repeat for colleges C, D, E, and F.
 * In the each successive cell under Statistic, type "Min", "Q1", "Median", "Q3", "Max".

Now let's add the formulas, noting that each column represents a different college corresponding to a different range of data:

Min

 * In the cell to the right of Min and underneath College A, type "=min(B2:B9)" because B2 through B9 represents the data for College A.
 * Moving one cell to the right (so that you're underneath College B), type "=min(B10:B17)" because B10 through B17 represents the data for College B.
 * Repeat this calculation for colleges C, D, E, and F, adjusting the data range appropriately for each college.

Q1

 * In the cell to the right of Q1 and underneath College A, type "=quartile(B2:B9,1)". [Note that in some operating systems the formula uses a semi-colon instead of a colon, "=quartile(B2:A9;1)".]
 * Repeat this calculation for colleges B, C, D, E, and F, adjusting the data range appropriately for each college.

Median

 * In the cell to the right of Median and underneath College A, type "=median(B2:B9)".
 * Repeat this calculation for colleges B, C, D, E, and F, adjusting the data range appropriately for each college.

Q3

 * In the cell to the right of Q3 and underneath College A, type "=quartile(B2:B9,3)". [Note that in some operating systems the formula uses a semi-colon instead of a colon, "=quartile(B2:B9;3)".]
 * Repeat this calculation for colleges B, C, D, E, and F, adjusting the data range appropriately for each college.

Max

 * In the cell to the right of Max and underneath College A, type "=Max(B2:B9)".
 * Repeat this calculation for colleges B, C, D, E, and F, adjusting the data range appropriately for each college.

Create side-by-side boxplots
Unfortunately, Calc can't create boxplots directly, so we have to create a column graph and alter it to display as side-by-side boxplots. We start by using the calculated Descriptive Statistics to generate new values that will be the basis for our column graph. The table that will result is found on the OLI Statistics Learn by Doing page (the table shown under ''Side-by-side boxplots.

Create the necessary statistics

 * In the Statistic column, in each successive cell under Max, type "25th Percentile", "50th Percentile", "75th Percentile", "Lower limit", and finally "Upper limit". Widen the column if needed.

The 25th Percentile values are the same as those for Q1. We need to copy the numbers from Q1 without copying the formulas.
 * Highlight the set of values for Q1.
 * Click Copy on the toolbar.
 * Highlight the empty cells in the 25th Percentile row.
 * Right-click and select Paste Special.
 * In the Paste Special dialog, check Numbers (only).
 * Click OK.

The 50th Percentile values are equal to Median - Q1. Enter the necessary formulas to calculate these values from the values in the Descriptive Statistics table. For example, if the value for College A Median is in cell M4, and the value for College A Q1 is in cell M3, the correct formula for calculating the 50th Percentile for College A would be =M4-M3.

The 75th Percentile values are equal to Q3 - Median. Enter the necessary formulas to calculate these.

The Lower limit values are equal to Q1 - min. Enter the necessary formulas to calculate these.

The Upper limit values are equal to Max - Q3. Enter the necessary formulas to calculate these.

Create the boxplots
We will use a stacked column chart to create side-by-side boxplots.


 * Highlight the values in the 25th Percentile, 50th Percentile, and 75th Percentile rows
 * Choose Insert > Chart...

The Chart Wizard displays.

In ''1. Chart Type''': In ''2. Data Range'':
 * Select Column.
 * Select the second version. labeled Stacked when you hover your mouse over it.
 * Click Next>>.
 * Click 'Data series in rows.
 * Check that First row as label and First column as label are unchecked (because the data range, highlighted before starting the chart wizard, does not include labels).

In ''3. Data Series'':
 * Click the Shrink button, [[Image:OpenOfficeShrink.png]], next to the Categories window.
 * Highlight the range of cells that display the category labels: College A, College B, etc.
 * click the Maximize button, [[Image:OpenOfficeMaximize.png]].
 * Click Next>>.

In ''4. Chart Elements'':
 * Enter a title for the chart.
 * Enter a title for the Y axis.
 * Uncheck Display grids for both the x axis and y axis.
 * Uncheck Display legend.
 * Click Finish.

Calc creates a stacked column graph. The idea here is that the top two boxes in each column will make up our boxplots. We'll add error bars for the Min and Max values, and simply remove the bottom box.
 * Check that the graph is in edit mode (a gray border surrounds the chart); if not, double click on the graph until the gray border displays.
 * Click on the top box in one of the columns to select it.
 * Right click on one of the top boxes and select Insert Y Error Bars....

The Error Indicator dialog displays.
 * Click on the Y Error Bars tab.
 * Under Error Category, select Cell Range.
 * Under Error Indicator, select Positive.
 * In the Parameters section, click the Shrink button, [[Image:OpenOfficeShrink.png]], next to the Positive(+) window.
 * Highlight the range of values in the Upper Limit row in the table.
 * Click the Maximize button, [[Image:OpenOfficeMaximize.png]].
 * Click OK.

Now repeat this Y error bar exercise for the bottom box in the columns, except in the Error Indicator section, select Negative and in the Parameters section, enter the range for the Lower Limit values in the Negative (-) window.

The last steps create the traditional boxplot display.
 * Right-click on the top box in one of the columns and select Object Properties....

The Data Series dialog displays.
 * Click on the Area tab.
 * In the drop-down box under Fill, select None.
 * Click OK.
 * Repeat the above steps for the middle box in the columns.
 * Repeat for the bottom box in the columns, but before closing the Data Series dialog box, click the Borders tab.
 * In the Style drop down box, select Invisible.
 * Click OK.

The bottom box simply disappears.

One last step.
 * Check that the graph is in edit mode (a gray border surrounds the chart); if not, double click on the graph until the gray border displays.
 * Right-click on the y-axis
 * Select Object Properties....

The Y Axis dialog displays. Click OK.
 * Select the Scale tab.
 * Under Axis scale, next to the Minimum window, uncheck the Automatic box.
 * In the Minimum window, enter "30".

The boxplots can be used to visually compare the distribution of graduation rates among the colleges.

Return to the OLI's second Boxplot page to complete the interpretive exercises.