User:ASnieckus/StatisticsContent/Learn by doing/Comparison of midterm scores (Question 2)

Description of activity and Excel instructions: StatTutor Comparison of midterm scores (Question 2)

Dataset: Direct link for gradebook.xls not available.

Exploratory Analyses
This exercise begins by taking a look at the data. To retain the correct analysis calculations for Question 1, copy and paste the raw data to a new sheet in your spreadsheet file. Label the sheet "Q2".

Calculate descriptive statistics
Before beginning on the table of descriptive statistics, let's sort the data to make it easier to check the tabulation.


 * Type Ctrl+A to select all of the data.
 * Select Data > Sort....

The Sort dialog displays.


 * Click the Options tab.
 * Check Range contains column labels.
 * Cleck the Sort Criteria tab
 * Under Sort by, in the drop down menu, select the relevant variable.
 * Click OK.

Now that the data are sorted, pick an empty area in your worksheet and create a table where the row headers are "Min", "Q1", "Median", "Q3", "Max", "Mean", "St Dev", "n".

For each cell in the table, enter the formula to calculate the statistic listed in the row header. The data range to include in each formula is the range of data in the quantitative variable. The formulas you will need are:
 * Min: "=min([range])"
 * Q1: "=quartile([range],1)" [Note that in some operating systems the formula uses a semi-colon instead of a colon between the range and the numeral "1".]
 * Median: "=median([range])"
 * Q3: "=quartile([range],3)" [Note that in some operating systems the formula uses a semi-colon instead of a colon, between the range and the number "3".]
 * Max: "=max([range])"
 * n: "=count([range])"
 * Mean: "=average([range])"
 * St Dev: "=stdev([range])"

Create the frequency table
First we need to define the bins (also called intervals or classes) in which to group the data. In an effort to create better x-axis labels for the histogram, let's create data labels in the column to the right of the bins.
 * Decide on an appropriate bin width, and then in one of the columns to the right of the data, make a list of the upper limit in each bin, from lowest down to highest. For example, if the data range from -10 to 10, you might choose a bin width of 2. (Your bins would start at -10, then -8, -6,..., 10). In general, choose a bin width so that there are between 5 and 15 bins.
 * Highlight the block of cells.
 * Select Format > Cells....

The Format Cells dialog displays.
 * Select the Numbers tab.
 * Select Text in the dropdown box under Category.
 * Click OK.
 * Enter the range in each cell (for example, '-11, -10' instead of just '-10').

Now add the frequencies in the next column to the right.
 * Use the FREQUENCY function to calculate the number of observations in each bin. Display the frequency data in the cells to the right of the bin labels. Remember that the FREQUENCY function requires two arguments: data and classes. Indicate the correct range for each of these.

Create the histogram

 * Choose Insert > Chart....

The Chart Wizard dialog box opens. No revisions are necessary on ''3. Data Series''.
 * On 1. Chart Type: select Column.
 * On 2. Data Range: insert the data range (labels and frequencies, but not the upper limit in each bin), and indicate that the data series is in a column, and the first column is a label.
 * On 4. Chart Elements: enter a title, axis labels, and adjust other elements on the chart.
 * Click Finish.

The chart will be inserted into the open sheet. Widen the chart as needed to correctly display the x-axis labels.
 * Move the chart to be situated next to the frequency data.
 * Remove the space between the bars, using the Object Properties... option.
 * Click OK.

Return to the StatTutor exercise for Question 2 to describe the key features of the data display and support your description with numerical measures. (Be sure to include the numerical results in your description.)

Calculate p-value for t-test
Calc provides a formula for calculating the p-value (note that the t-value is not calculated) which can be used with a few types of t-tests: TTEST([range y1],[range y2],mode,type). Type=1 is for use in this situation: matched pairs. Enter the range for each of the variables included in the difference. Note that Mode=1 indicates a one-sided test:
 * To calculate the p-value, enter the formula "=TTEST([range y1],[range y2],1,1)".

For a p-value of "0", convert the cell format to scientific notation to display the actual value.

Return to the StatTutor exercise for Question 2 to report results and draw conclusions.