Examining effectivenss of randomization using SPSS

From WikiEducator
Jump to: navigation, search

A local internet service provider (ISP) has developed two new versions of its software, each with a different tool to access a new feature. The ISP wants to identify which of the three software products (one of the two new versions or the existing software) would result in (cause) the highest customer satisfaction. To investigate this, the ISP has designed an experiment to compare users' preferences for the two new versions and the existing software.

The marketing department has identified three major potential lurking variables that might impact user satisfaction: gender, age, and hours per week of computer use.

The purpose of this activity is to explore the effectiveness of randomization in creating similar treatment groups (in this case groups assigned to test the different software versions), such that the groups are balanced with respect to other variables not controlled for in the experiment (gender, age, and hours per week of computer use).

The population in the local area of interest is 20,784. It is not feasible to include the full population in the study; the ISP decides to do a simple random sample of 450 individuals.

In this activity you will:

  • use a randomization function to create a new variable which assigns individuals to one of the three treatment groups
  • verify that the three treatment groups created are similar with respect to the most obvious lurking variables: gender, age, and hours per week of computer use.[1]

The dataset contains the values of the three possible lurking variables:

  • age (in years)
  • gender (female, male)
  • computer (hours per week of computer use)


There are two versions of the dataset:

  • computers.xls, which includes 20,784 records (the full population). Instructions for obtaining a simple random sample of 450 observations are included in the Causation and Experiments activity.
  • an SPSS version of the dataset, available on your class website: computers-srs.sav, which includes only the 450 observations in the sample.

Open the dataset in the SPSS data editor.

The following instructions are based on the student version of PASW (SPSS) version 18.

Assign observations to treatment groups

We will assign the 450 observations to the three treatment groups by creating a new categorical variable which implements a random function whose outcome is one of three values: 1, 2, or 3.

  • Select Transform > Compute Variable....

The Compute Variable dialog opens.

  • In the Target Variable: field, type "Treatment" (or another suitable variable name).
  • In the Numeric Expression: field, copy and paste in the following: TRUNC(RV.UNIFORM(1.0000,3.9999))

(Note: RV.UNIFORM generates a random number between the first and last numbers listed; TRUNC truncates the number generated to the units place. The result is an integer: 1, 2, or 3.)

Examining the effectiveness of the randomization function

Let's now examine the distributions of each of the potential lurking variables (gender, age, and computer) for each of the three treatment groups to see whether the randomization was effective in balancing the variables across groups.

To examine the gender variable create a two-way table of counts and cell percents:

  • Use the Crosstabs analysis (under Descriptive Statistics) to create the two-way table. Check the box for Total percents. (Example instructions)

To examine the age variable, create summary statistics for age by treatment group and create side-by-side boxplots:

  • Use the Explore analysis (under Descriptive Statistics) to compute n, mean, sd, and the 5 number summary for each treatment group. (Example instructions)
  • Create side-by-side boxplots for each of the treatment groups. (Example instructions)

To examine the computer variable, create summary statistics for computer by treatment group and create side-by-side boxplots. Follow the instructions for age above.

Icon activity.jpg
How do the groups compare?
  • Describe how the three groups compare with respect to gender, age, and computer use.
  • Does the randomization process allow us to compare the three different software products, without concern for the effects of the three potential lurking variables gender, age, and hours per week of computer use?


  1. Adapted from Open Learning Initiative. Probability and Statistics: Causation and Experiments to provide instructions for doing the analyses using SPSS.