# Boxplot2, 2 of 2

Description of activity and Excel instructions: Boxplot, 2 of 2

Dataset: graduation.xls

## Contents

## Calculate descriptive statistics by college

To start, we'll create a table that will list each statistical measure and the value for each college. The table that will result is found on the OLI Statistics Learn by Doing page.

Begin by making the row and column labels:

- Select a cell to be the upper left corner of the table; choose a cell in an empty section of your worksheet (to the right of the data columns)
- Type "Statistic".
- Moving one cell to the right, type "College A".
- Moving one cell to the right, type "College B".
- Repeat for colleges C, D, E, and F.
- In the each successive cell under
*Statistic*, type "Min", "Q1", "Median", "Q3", "Max".

Now let's add the formulas, noting that each column represents a different college corresponding to a different range of data:

### Min

- In the cell to the right of
*Min*and underneath*College A*, type "=min(B2:B9)" because B2 through B9 represents the data for College A. - Moving one cell to the right (so that you're underneath College B), type "=min(B10:B17)" because B10 through B17 represents the data for College B.
- Repeat this calculation for colleges C, D, E, and F, adjusting the data range appropriately for each college.

### Q1

- In the cell to the right of
*Q1*and underneath*College A*, type "=quartile(B2:B9,1)". [Note that in some operating systems the formula uses a semi-colon instead of a colon, "=quartile(B2:A9;1)".] - Repeat this calculation for colleges B, C, D, E, and F, adjusting the data range appropriately for each college.

### Median

- In the cell to the right of
*Median*and underneath*College A*, type "=median(B2:B9)". - Repeat this calculation for colleges B, C, D, E, and F, adjusting the data range appropriately for each college.

### Q3

- In the cell to the right of
*Q3*and underneath*College A*, type "=quartile(B2:B9,3)". [Note that in some operating systems the formula uses a semi-colon instead of a colon, "=quartile(B2:B9;3)".] - Repeat this calculation for colleges B, C, D, E, and F, adjusting the data range appropriately for each college.

### Max

- In the cell to the right of
*Max*and underneath*College A*, type "=Max(B2:B9)".

## Create side-by-side boxplots

Unfortunately, Calc can't create boxplots directly, so we have to create a column graph and alter it to display as side-by-side boxplots. We start by using the calculated *Descriptive Statistics* to generate new values that will be the basis for our column graph. The table that will result is found on the OLI Statistics Learn by Doing page (the table shown under *Side-by-side boxplots.*

### Create the necessary statistics

- In the
*Statistic*column, in each successive cell under*Max*, type "25th Percentile", "50th Percentile", "75th Percentile", "Lower limit", and finally "Upper limit". Widen the column if needed.

The 25th Percentile values are the same as those for Q1. We need to copy the numbers from Q1 without copying the formulas.

- Highlight the set of values for Q1.
- Click
**Copy**on the toolbar. - Highlight the empty cells in the
*25th Percentile*row. - Right-click and select
**Paste Special**. - In the
*Paste Special*dialog, check**Numbers**(only). - Click
**OK**.

The 50th Percentile values are equal to Median - Q1. Enter the necessary formulas to calculate these values from the values in the *Descriptive Statistics* table. For example, if the value for College A Median is in cell M4, and the value for College A Q1 is in cell M3, the correct formula for calculating the 50th Percentile for College A would be =M4-M3.

The 75th Percentile values are equal to Q3 - Median. Enter the necessary formulas to calculate these.

The Lower limit values are equal to Q1 - min. Enter the necessary formulas to calculate these.

The Upper limit values are equal to Max - Q3. Enter the necessary formulas to calculate these.

### Create the boxplots

We will use a stacked column chart to create side-by-side boxplots.

- Highlight the values in the
*25th Percentile*,*50th Percentile*, and*75th Percentile*rows - Choose
**Insert**>**Chart...**

The *Chart Wizard* displays.

In *1. Chart Type'*:

- Select
**Column**. - Select the second version. labeled
**Stacked**when you hover your mouse over it. - Click
**Next>>**.

In *2. Data Range*:

- Click '
*Data series in rows*. - Check that
**First row as label**and**First column as label**are unchecked (because the data range, highlighted before starting the chart wizard, does not include labels).

In *3. Data Series*:

- Click the Shrink button, , next to the
*Categories*window. - Highlight the range of cells that display the category labels: College A, College B, etc.
- click the Maximize button, .
- Click
**Next>>**.

In *4. Chart Elements*:

- Enter a title for the chart.
- Enter a title for the Y axis.
- Uncheck
**Display grids**for both the x axis and y axis. - Uncheck
**Display legend**. - Click
**Finish**.

Calc creates a stacked column graph. The idea here is that the top two boxes in each column will make up our boxplots. We'll add error bars for the Min and Max values, and simply remove the bottom box.

- Check that the graph is in edit mode (a gray border surrounds the chart); if not, double click on the graph until the gray border displays.
- Click on the top box in one of the columns to select it.
- Right click on one of the top boxes and select
**Insert Y Error Bars...**.

The *Error Indicator* dialog displays.

- Click on the
*Y Error Bars*tab. - Under
*Error Category*, select**Cell Range**. - Under
*Error Indicator*, select**Positive**. - In the
*Parameters*section, click the Shrink button, , next to the*Positive(+)*window . - Highlight the range of values in the
*Upper Limit*row in the table. - Click the Maximize button, .
- Click
**OK**.

Now repeat this Y error bar exercise for the bottom box in the columns, except in the *Error Indicator* section, select **Negative** and in the *Parameters* section, enter the range for the *Lower Limit* values in the *Negative (-)* window.

The last steps create the traditional boxplot display.

- Right-click on the top box in one of the columns and select
**Object Properties...**.

The *Data Series* dialog displays.

- Click on the
**Area**tab. - In the drop-down box under
*Fill*, select**None**. - Click
*OK*. - Repeat the above steps for the middle box in the columns.
- Repeat for the bottom box in the columns, but before closing the
*Data Series*dialog box, click the**Borders**tab. - In the
*Style*drop down box, select**Invisible**. - Click
**OK**.

The bottom box simply disappears.

One last step.

- Check that the graph is in edit mode (a gray border surrounds the chart); if not, double click on the graph until the gray border displays.
- Right-click on the y-axis
- Select
**Object Properties...**.

The Y Axis dialog displays.

- Select the
**Scale**tab. - Under
*Axis scale*, next to the*Minimum*window, uncheck the**Automatic**box. - In the
*Minimum*window, enter "30".

Click **OK**.

The boxplots can be used to visually compare the distribution of graduation rates among the colleges.

Return to the OLI's second Boxplot page to complete the interpretive exercises.