A. Use data set 1.1 (set of 20 scores on a test of prejudice measured on scale of 1-100)

1. Open dataset 1.1

2. In the command box type:

desc

The “desc” command tells us about our data and included the number of observations and variables.

3. In the command box type:

Sum

The “sum” command calculates and displays some summary statistics including the number of observations, mean, standard deviation, and range

 

4. In the command box type:

sum prejudice, detail

The “sum [variable name], detail” command provides even more information including percentiles which we can use to identify the median. We can also see the skewness of our observation. A normal distribution is not skewed.

5. In the command box type:

tab prejudice

The “tab [variable name]” command will produce a table of summary statistics. This table shows the frequency of each score as a number, percentage, and cumulative percentage.

6. In the command box type:

egen mode = mode(prejudice)

tab mode

The “egen [name of new variable] = mode (variable we want the mode of)” will create a new variable called mode that gives the mode of whichever variable we ask it to. In the variable list on the right side of the screen you will see a new variable called mode  under prejudice. The “egen [variable]” command is extremely versatile and we will be using it a lot.

 

Copy and paste the outputs into the handout and answer the following questions:

1. How many observations and variables are included in this small dataset?

2. What is the mean, median, and mode? Hint: the mean is given to you directly but for the median you will need to refer to the percentiles and for the mode you can refer to the frequency table or the new mode variable we created.

3. Compare the mean, median, and mode. What do their differences tell us about this dataset?

4. Is the distribution normal?

 

B. Use data set 1.2 (set of scores for 10 cases on 3 different tests)

1. Open dataset 1.2

2. In the command box type:

desc

2. In the command box type:

Sum

If you only want the summary statistics of one of the variables you can always specify that variable, “sum variable”

3. In the command box type:

sum, detail

If you only want the summary statistics of a single variable use the “sum [variable name], detail” command from the first question.

4. In the command box type:

tab score1

Do this for each variable to show the frequency of each score  

 

Copy and paste the outputs into the handout and answer the following questions:

1. How many observations and variables are included in this dataset?

2. What is the mean, median, and mode of each test? How do they compare?

 

C. Use data set 1.3 (Test of reaction time for a sample of 30)

Use the commands you have practiced above to calculate the standard deviation and variance.

What is the standard deviation? Interpret- how does the average reaction time vary from the mean?

 

D. Use dataset 1.4 (set of scores for 10 cases on 3 different tests)

Use the commands you have practiced above to calculate the variability. Which test had smallest amount of variability?

 

E. Short answer questions (you don’t need STATA for these and you don’t have to calculate anything!)

1. Why would you expect more variability on a measure of personality in college freshmen than you would on a measure of age?

2. Why does the standard deviation get smaller as individuals in a group score more similarly on a test? Why would you expect the amount of variability on a measure to be relatively less with a larger number of observations than with a smaller one?

3. What are the characteristics of the normal curve? What human trait, behavior, or characteristic can you think of that is distributed normally? Why do you think it is?

4. What three bits of information do you need to compute a z-score?

5. Standard scores, such as z-scores, allow us to make comparisons across different samples. Why?