Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STAT4051

Fall 2022

Midterm I

Problem I. Short Answer (20 points total)

Show all work for full credit unless noted otherwise.

Short Answers.

1.  (5 points) What is the concern with a per comparison test for a treatment with six levels?

2.  (5 points) What is data snooping?

3.  (5 points) Why are Tukey p-values always larger than linear contrast p-values?

4.  (5 points) What statistical distribution is Tukey’s test based on?

Problem II. Short Answer (15 points total)

1.  (5 points) My husband and I hike together and we both track the distance we walk with our iphones.

i. What statistical test would you use to test whether there is a statistical difference between our iphones for the distances we hiked?

ii. State H0  and Ha .

2.  (10 points) Do buses run slower in lower income neighborhoods compared to affluent neighborhoods?  Arrival times for 25 buses in each of the two neighborhoods for a typical morning commute were obtained. You compared the arrival of each bus to the scheduled arrival and took the difference, i.e., scheduled arrival - observed arrival for the

50 buses. Buses that arrive ahead of schedule are positive differences (recorded in minutes) whereas buses that arrive behind schedule are negative differences. You summarized the following data:

 

Low Income Neighborhood   Affluent Neighborhood

Difference

 (min.) s

n

-4

2

25

-1

2

25

-3

1

25

Construct a 95% confidence interval to answer the above question in bold.

Note:    a correct formula, a correctly specified quantile and correct plug-ins of observed data will get full credit. Do not perform calculations.

Problem III. Short Answer (20 points total)

Show all work for full credit unless noted otherwise.

Sixty-one percent (61%) of the US population favors legal abortion. A survey was conducted on campus to determine the proportion of students who approve of keeping legal abortion. Eighty-two (82) students out of a 100 UofM students think is should remain legal in the US.

1.  (5 points) Assume you use the UofM data to compute a 95% bootstrap percentile con- fidence interval for the population proportion.  How would you generate the bootstrap distribution for the confidence interval? Describe your approach to creating the distribu- tion in words. Be specific. You do not need to give R code.

2.  (5 points) Once you generated the bootstrap distribution, how would you determine the actual confidence interval?

3.  (5 points) Did you generate a nonparametric or parametric bootstrap confidence interval? Explain.

4.  (5 points) How would you use the confidence interval to determine if the UofM estimate is statistically different from the overall US population proportion? Explain.

Problem IV. Short Answer (45 points total)

Consider an experiment taste-testing six types of chocolate chip cookies:

1 (Brand A, chewy, expensive),

2 (Brand A, crispy, expensive),

3 (Brand B, chewy, inexpensive),

4 (Brand B, crispy, inexpensive),

5 (Brand C, chewy, expensive), and

6 (Brand D, crispy, inexpensive).

We use twenty different raters randomly assigned to each type (120 total raters). Cookies were rated on a 0 - 100 scale with 100 being a perfect score.

The following information is provided.

>  tapply(rating,cc .type,mean)

Brand  A,chewy,expensive     Brand  A,crispy,expensive     Brand  B,chewy,inexpensive 55.2940                                       55.5826                                         47.6240

Brand  B,crispy,inexpensive       Brand  C,chewy,expensive    Brand  D,crispy,inexpensive 70.0496                                       65.0387                                         44.4398


1.  (7 points) Complete the ANOVA table:

> model .1<-aov(rating~cc .type) 

>  summary(model .1)

                             Df       Sum  Sq      Mean  Sq        F value           Pr(>F)

cc .type             1 .____     3 .____        5 .____         7 .____           <2e-16  ***

Residuals           2 .____     4 .____        6 .____


_________________________________________________________________

Total                                   10030

Residual  standard  error:  1 .816  on XX  degrees  of  freedom

2.  (12 points) Based on the lm() results presented below:

i. Find the standard error of the (Intercept). Show work.

ii. Find the standard error of the cc .typeBrand  A,  crispy,  expensive row.   Show work.

iii. What is the cc .typeBrand  A,  crispy,  expensive row estimating?

lm(formula  =  rating  ~  cc .type)

Coefficients:

Estimate  Std .  Error  t  value  Pr(>|t |)        (Intercept)                                               55 .2940          0 .4047  136 .629      <2e-16  *** cc .typeBrand  A,crispy,expensive          0 .2886          0 .5723      0 .504        0 .615        cc .typeBrand  B,chewy,inexpensive      -7 .6700          0 .5723  -13 .401      <2e-16  *** cc .typeBrand  B,crispy,inexpensive    14 .7556          0 .5723    25 .781      <2e-16  *** cc .typeBrand  C,chewy,expensive            9 .7447          0 .5723    17 .026      <2e-16  *** cc .typeBrand  D,crispy,inexpensive  -10 .8542          0 .5723  -18 .965      <2e-16  ***

---

Residual  standard  error:  1 .816  on  XX  degrees  of  freedom

Multiple  R-squared:    0 .9628,Adjusted  R-squared:    0 .9611

F-statistic:  XXX  on  XXX  and  XXX  DF,    p-value:  <  2 .2e-16

3.  (8 points) Supply coefficients for the following two contrasts:

1.  compare chewy vs. crispy

2.  compare Brand A vs. Brand B.

4.  (8 points) Are your contrasts in question 3 orthogonal? Why or why not? Show work.

5.  (10 points) Construct a 95% confidence interval for one of your contrasts in question 3.  Include formula and plug-ins for full credit. Do not perform calculations.