Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Statistics I

Project 1

Due: June 12, 2023

Understanding the Standard Deviation Formula:

Why Divide by n – 1?

This activity will give you an opportunity to better understand the formula for the sample standard deviation and why it involves division by n – 1 instead of n.  You will also get some practice finding sample and population standard deviations and variances, whether by hand or using technology.

Consider a family of four people, aged 8, 10, 33, and 37, as a population.

Calculate the population variance and standard deviation for the ages.

Variance:                                          Standard deviation:                          

Record each of the possible 16 samples of size two from this population in the second column of the  chart.  Assume that random samples of size two are drawn with replacement from this population and their ages are recorded.  Instructions for completing the rest of the table continue on the next page.

Sample

Sample

Values

SD

(treated as a sample)

VAR

(treated as a sample)

SD

(treated as a population)

VAR

(treated

as a

population)

1

 

 

 

 

 

2

 

 

 

 

 

3

 

 

 

 

 

4

 

 

 

 

 

5

 

 

 

 

 

 

 

Mean =

Mean =

Mean =

Mean =

40     Understanding the Standard Deviation Formula

3.           Find the variance and standard deviation for each sample (using the formula that involves division by n – 1).  If you are using technology for the computation, the standard deviation would be labeled s not sigma.

List the values in the 3rd and 4th columns of the table.

4.           Find the variance and standard deviation for each sample by treating the sample as a population (using the formula that involves division by n).  If you are using technology for the computation, the standard deviation would be labeled sigma, not s.

List the values in the 5th and 6th columns of the table.

5.           Find the mean for each of the 3rd, 4th, 5th, and 6th columns.  List them in the last row of the table.

6.           If the mean of a statistic (over all possible samples of the same size) is equal to the population parameter, the statistic is said to be an unbiased estimator of that parameter.

Which method (dividing by n – 1 or dividing by n or neither) resulted in an unbiased estimate of the population variance?

Which method (dividing by n – 1 or dividing by n or neither) resulted in an unbiased estimate of the population standard deviation?

Scaling and Shifting Data

This activity will help you consider how transforming data affects the mean, standard deviation, and shape of a distribution.

1.

Use StatCrunch to generate 40 exam scores from a normal distribution with a mean of 75 and a standard deviation of 10. Name the column Exam Scores.”

Calculate the mean and standard deviation for your exam scores.

Mean =                                 Standard deviation =                                       

Choose one of the exam scores and calculate its z-score.

d.           Create a histogram for the exam scores and sketch it here.

2.           Suppose the teacher plans to adjust the scores by adding ten points to each students score.

a.           Transform the exam score data by adding ten points to each score.  Create a new column and name it Exam Scores + 10.”

b.           Calculate the mean and standard deviation for the adjusted exam scores. Mean =                                Standard deviation =                                     

c.           Calculate the z-score for the same student whose score you used in question 1.

d.           Create a histogram for the exam scores and sketch it here.

42     Scaling and Shifting Data

e.           Describe how adding ten points to the exam scores affected the mean, standard deviation, and the shape of the distribution.  How did the adjustment affect the student’s z-score?

Suppose the teacher plans to adjust the scores by giving each student a score 20% higher than his/her original score.

a.           Transform the exam score data by multiplying each score by 1.2.  Create a new column and name it “Exam Scores x 1.20.”

b.           Calculate the mean and standard deviation for the adjusted exam scores. Mean =                                Standard deviation =                                     

c.           Calculate the z-score for the same student whose score you used in question 1.

 

d.           Create a histogram for the exam scores and sketch it here.

Describe how adjusting the scores by multiplying by 1.20 affected the mean, standard        deviation, and the shape of the distribution.  How did the adjustment affect the student’s z- score?

Exploring Properties of the Linear Correlation Coefficient

In this activity you will use an applet to create scatter diagrams and calculate the linear correlation       coefficient.  You will have the opportunity to observe important properties and limitations of the linear correlation coefficient.

Load the Correlation by Eye Applet that is located at www.pearsonhighered.com/sullivanstats. Or, from StatCrunch, select Applets > Correlation by eye. Select the “Randomly generated” radio button and click Compute!.

1.           Click “Reset” at the top of the applet to clear the data from the scatter diagram.

a.            Create a scatter diagram of 12 to 15 observations with positive association.  Click Show” at the bottom of the applet to show the correlation coefficient of the data in the scatter

diagram.   Draw the scatter diagram (or copy it from the applet) and record the correlation coefficient below.

Move some of the observations from the scatter diagram and note how the correlation coefficient changes as the positive association strengthens and weakens.

Align the points in the scatter diagram in a straight line with positive slope.   What is the value of the linear correlation coefficient?

2.           Click Resetat the top of the applet to clear the data from the scatter diagram.

a.            Create a scatter diagram of 12 to 15 observations with negative association.  Click Show” at the bottom of the applet to show the correlation coefficient of the data in the scatter

diagram.   Draw the scatter diagram and record the correlation coefficient on the next page.

48     Exploring Properties of the Linear Correlation Coefficient

Move some of the observations from the scatter diagram and note how the                correlation coefficient changes as the negative association strengthens and weakens.

Align the points in the scatter diagram in a straight line with negative slope.   What is the value of the linear correlation coefficient?

3.           a.           Click Reset” at the top of the applet to clear the data from the scatter diagram.  Create a scatter diagram with no association.   What is the value of the correlation coefficient?

b.           Click Reset” at the top of the applet to clear the data from the scatter diagram.   Create a  scatter diagram in an upside-down U-shaped pattern.  Draw the scatter diagram (or copy it from the applet) and record the correlation coefficient below.

c.           What does a correlation coefficient of 0 suggest?

4.           a.

Click Reset” at the top of the applet to clear the data from the scatter diagram.  In the lower- left corner of the applet, draw a scatter diagram of 8 to 10 observations with a correlation      coefficient around 0.8.

Add another point in the upper-right corner of the applet that roughly lines up with the other points in the scatter diagram.   What is the new value of the correlation coefficient?

Move the additional point around the scatter diagram and note how the correlation coefficient changes.  Is the correlation coefficient a resistant measure?  Why or why not?

5.           a.           Click Reset” at the top of the applet to clear the data from the scatter diagram.  Draw a        scatter diagram with six points arranged vertically in a straight line.  What is the value of the correlation coefficient?  Why?

Add a seventh point to the right side of the scatter diagram  and move the point around until the correlation coefficient is approximately 0.75.

Click Reset” at the top of the applet to clear the data from the scatter diagram.  Draw a       scatter diagram with approximately seven points in a U-shaped pattern near the lower-left    corner of the applet.  Add an eighth point to the scatter diagram and move it around until the correlation coefficient is approximately 0.75.  Draw the scatter diagram (or copy it from the applet).

Explain why the correlation coefficient should not be used exclusively to judge linear association without also using a scatter diagram.

Using Binomial Probabilities in Baseball

This activity involves finding the probability of breaking a homerun record using simulations and the   binomial probability formula.  Just how unusual was it when Mark McGuire broke the homerun record in 1998?

In 1998 the baseball world was enthralled by the epic chase of Mark McGuire and Sammy Sosa to surpass the single-season homerun record of 61 set by Roger Maris in 1961.

1.           a.           Prior to his prolific 1998 season in which he shattered Roger Maris’ single season homerun record of 61 by hitting 70 “round-trippers,” Mark McGuire averaged 1 homerun every 11.9 at-bats.  Assuming this rate of homerun hitting applied to the 1998 season, determine the    probability McGuire hits a homerun during a randomly selected at-bat in 1998.

Open the Baseball Applet at www.pearsonhighered.com/sullivanstats or from StatCrunch, open the Coin Flipping Applet.  Enter the probability determined in part (a) in the             “Probability of heads” cell.  Under the number of coins, enter 600 to represent the typical number of at-bats during the season for a starting player.

Run a total of 20 repetitions by clicking “5 runs” four times.  What does each of these 20 repetitions represent?

Based on the graph, how many of the repetitions result in 62 or more homeruns (indicating Maris’ record is broken)?

There are a number of players who have averaged 1 homerun every 11.9 at-bats since Maris set his record.

Increase the number of repetitions to 1,000 with the number of tosses at 600. What does each of these 1,000 repetitions represent?

In the cell “As extreme as,” enter ≥ 62 and select Count.”  This will allow us to determine the likelihood of a player hitting 62 or more homeruns in a season to break Maris’ record.

96     Using Binomial Probabilities in Baseball

2.           a.           Use the binomial probability formula to compute the exact probability of Mark McGuire breaking Maris’ record over the course of a 20-season career assuming he averages 1      homerun every 11.9 at-bats.

What does this probability change to if McGuire is able to increase his homerun rate to 1 homerun every 10.8 at-bats?  This was McGuire’s homerun rate in 1998.

Use the binomial probability formula to compute the exact probability of any particular player (who averages 1 homerun every 11.9 at-bats) with 600 at-bats breaking Maris’ record.

Over the past few years, the prolific homerun hitters have been averaging 1 homerun every 13 at-bats.  Assuming the league has 10 prolific homerun hitters in any given season, what is the likelihood of McGuire’s record being broken in the next 20 years?  Assume 600 at-bats per

year.

Answer using a simulation:

Answer using the binomial probability formula:

What homerun rate would be required among the top 10 homerun hitters in order for there to be at least a 5% chance of breaking McGuire’s record within the next 20 years?  Assume 600 at-bats per year.