STAT4051 Fall 2022 Midterm I
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
STAT4051
Fall 2022
Midterm I
Problem I. Short Answer (20 points total)
Show all work for full credit unless noted otherwise.
Short Answers.
1. (5 points) What is the concern with a per comparison test for a treatment with six levels?
2. (5 points) What is data snooping?
3. (5 points) Why are Tukey p-values always larger than linear contrast p-values?
4. (5 points) What statistical distribution is Tukey’s test based on?
Problem II. Short Answer (15 points total)
1. (5 points) My husband and I hike together and we both track the distance we walk with our iphones.
i. What statistical test would you use to test whether there is a statistical difference between our iphones for the distances we hiked?
ii. State H0 and Ha .
2. (10 points) Do buses run slower in lower income neighborhoods compared to affluent neighborhoods? Arrival times for 25 buses in each of the two neighborhoods for a typical morning commute were obtained. You compared the arrival of each bus to the scheduled arrival and took the difference, i.e., scheduled arrival - observed arrival for the
50 buses. Buses that arrive ahead of schedule are positive differences (recorded in minutes) whereas buses that arrive behind schedule are negative differences. You summarized the following data:
|
Low Income Neighborhood Affluent Neighborhood |
Difference |
|
(min.) s n |
-4 2 25 |
-1 2 25 |
-3 1 25 |
Construct a 95% confidence interval to answer the above question in bold.
Note: a correct formula, a correctly specified quantile and correct plug-ins of observed data will get full credit. Do not perform calculations.
Problem III. Short Answer (20 points total)
Show all work for full credit unless noted otherwise.
Sixty-one percent (61%) of the US population favors legal abortion. A survey was conducted on campus to determine the proportion of students who approve of keeping legal abortion. Eighty-two (82) students out of a 100 UofM students think is should remain legal in the US.
1. (5 points) Assume you use the UofM data to compute a 95% bootstrap percentile con- fidence interval for the population proportion. How would you generate the bootstrap distribution for the confidence interval? Describe your approach to creating the distribu- tion in words. Be specific. You do not need to give R code.
2. (5 points) Once you generated the bootstrap distribution, how would you determine the actual confidence interval?
3. (5 points) Did you generate a nonparametric or parametric bootstrap confidence interval? Explain.
4. (5 points) How would you use the confidence interval to determine if the UofM estimate is statistically different from the overall US population proportion? Explain.
Problem IV. Short Answer (45 points total)
Consider an experiment taste-testing six types of chocolate chip cookies:
1 (Brand A, chewy, expensive),
2 (Brand A, crispy, expensive),
3 (Brand B, chewy, inexpensive),
4 (Brand B, crispy, inexpensive),
5 (Brand C, chewy, expensive), and
6 (Brand D, crispy, inexpensive).
We use twenty different raters randomly assigned to each type (120 total raters). Cookies were rated on a 0 - 100 scale with 100 being a perfect score.
The following information is provided.
> tapply(rating,cc .type,mean)
Brand A,chewy,expensive Brand A,crispy,expensive Brand B,chewy,inexpensive 55.2940 55.5826 47.6240
Brand B,crispy,inexpensive Brand C,chewy,expensive Brand D,crispy,inexpensive 70.0496 65.0387 44.4398
1. (7 points) Complete the ANOVA table:
> model .1<-aov(rating~cc .type)
> summary(model .1)
Df Sum Sq Mean Sq F value Pr(>F)
cc .type 1 .____ 3 .____ 5 .____ 7 .____ <2e-16 ***
Residuals 2 .____ 4 .____ 6 .____
_________________________________________________________________
Total 10030
Residual standard error: 1 .816 on XX degrees of freedom
2. (12 points) Based on the lm() results presented below:
i. Find the standard error of the (Intercept). Show work.
ii. Find the standard error of the cc .typeBrand A, crispy, expensive row. Show work.
iii. What is the cc .typeBrand A, crispy, expensive row estimating?
lm(formula = rating ~ cc .type)
Coefficients:
Estimate Std . Error t value Pr(>|t |) (Intercept) 55 .2940 0 .4047 136 .629 <2e-16 *** cc .typeBrand A,crispy,expensive 0 .2886 0 .5723 0 .504 0 .615 cc .typeBrand B,chewy,inexpensive -7 .6700 0 .5723 -13 .401 <2e-16 *** cc .typeBrand B,crispy,inexpensive 14 .7556 0 .5723 25 .781 <2e-16 *** cc .typeBrand C,chewy,expensive 9 .7447 0 .5723 17 .026 <2e-16 *** cc .typeBrand D,crispy,inexpensive -10 .8542 0 .5723 -18 .965 <2e-16 ***
---
Residual standard error: 1 .816 on XX degrees of freedom
Multiple R-squared: 0 .9628,Adjusted R-squared: 0 .9611
F-statistic: XXX on XXX and XXX DF, p-value: < 2 .2e-16
3. (8 points) Supply coefficients for the following two contrasts:
1. compare chewy vs. crispy
2. compare Brand A vs. Brand B.
4. (8 points) Are your contrasts in question 3 orthogonal? Why or why not? Show work.
5. (10 points) Construct a 95% confidence interval for one of your contrasts in question 3. Include formula and plug-ins for full credit. Do not perform calculations.
2023-03-03