Creating a histogram in OpenOffice Calc

From WikiEducator

Jump to:navigation, search


A histogram is a graphical tool used to visually display a collection of data. A histogram is appropriate for use with data values on a continuous scale (also called interval or quantitative scale). The following instructions are written as a "follow along" exercise, using a dataset available for download.

Contents

Example scenario: Old Faithful geyser eruptions

Old Faithful is a geyser in Yellowstone National Park in Wyoming, USA. To better understand the timing of the eruptions, the duration of the eruptions and the time since the previous eruption were measured over 23 consecutive days. The Old Faithful dataset contains 222 observations, including date of observation, duration of the eruption and the time since the previous eruption. A histogram of the duration of eruption will provide a graphical display showing the distribution (shape) of the data. Use the following instructions to create such a histogram.

Prepare the data

Download the Old Faithful dataset by choosing the "red dot" (Excel) version. Open the dataset with scalc.exe (the OpenOffice spreadsheet application).

Define the histogram's class interval

The class interval specifies the range of values to include in each bar of the histogram. The interval must be the same for all bars.

In unused cells to the right of the data, enter the formulas for min, max and count. These values are needed to determine the number and size of the classes.

The Rice Rule with 222 observations yields 12 classes. Choose a starting value that is equal to or slightly less than the lowest value.[1] In this example set the starting value just below the minimum duration, 1.6. Calculate the size of the class interval by dividing the difference between the max and min by the number of class intervals. In this case, , which rounds to .

With a starting value of 1.6, a max of 5.2, and a class size of .3, the upper threshold for each class will be 1.9, 2.2, 2.5, ... 5.2. Lower limits are not needed when using the OpenOffice FREQUENCY function. Adding an additional class above and below the extreme values will visually suggest that all of the sample data is included.

Calculate the frequencies in each class interval

The frequency count in each class can be automatically counted using the FREQUENCY function in OpenOffice. The FREQUENCY function is an array function, returning values to a range of cells.

The Function Wizard will display. The FREQUENCY function requires two arguments: data and classes.

The completed formula will display in the formula bar.

The frequencies for each class interval display in the cells adjacent to the class limits.

Create the histogram

The Chart Wizard dialog box opens.

No revisions are necessary on 3. Data Series.

The chart is inserted into the open sheet.

The histogram displays.

The histogram clearly indicates that the duration data are bimodal with one mode near 1.9 minutes and a second mode near 4.6 minutes.

Notes

  1. An alternative approach is offered by the author's of Collaborative Statistics (2. Descriptive Statistics, Histograms) who recommend a starting value carried out to one more decimal place than the data point with the most decimal values.
Navigation
Community
Create a book
Toolbox