Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STAT4051

Fall 2022

Midterm II

Problem I. Short Answer (28 points total)

Show all work for full credit unless noted otherwise.

A homework assignment in STAT 4051 had the following problem:

”A study was conducted in guinea pigs to investigate the effect of dose of vitamin C and de- livery method on the length of odontoblasts (cells responsible for tooth growth).  Data was recorded for 51 guinea pigs. The variables are:

• dose of vitamin C (0.5, 1, or 2 mg/day)

• delivery method (orange juice or ascorbic acid)

The researcher was interested in these particular levels. Analyze the ToothGrowth2 dataset to determine what factors, dose and/or method, affect odontoblast length at α = 0.05.  If a factor is statistically significant, then there is interest is determining what levels are statistically different.”

Here is a snapshot of the dataset:

>  head(ToothGrowth2,2)

dose  length method

1    0 .5        4 .2         VC

2    0 .5      11 .5         VC

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Two students, Mickey and Minnie, were doing their homework together. They both fit a two- way ANOVA model to the above data, but got different results! See below.

Mickey’s Results:

> model .Mickey<-aov(length~method*dose,data=ToothGrowth2)

>  summary(model .Mickey)

Df  Sum  Sq  Mean  Sq  F  value      Pr(>F)

method              1      22 .9        22 .9      1 .586        0 .214

dose                  2  2118 .5    1059 .2    73 .483  6 .68e-15  ***

method:dose    2      33 .5        16 .8      1 .162        0 .322

Residuals      45    648 .7        14 .4

Minnie’s Results:

> model .Minnie<-aov(length~dose*method,data=ToothGrowth2)

>  summary(model .Minnie)

Df  Sum  Sq  Mean  Sq  F  value      Pr(>F)

dose                  2  1860 .8      930 .4    64 .546  6 .02e-14  ***

method              1    280 .5      280 .5    19 .460  6 .34e-05  ***

dose:method    2      33 .5        16 .8      1 .162        0 .322

Residuals      45    648 .7        14 .4

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Since Mickey and Minnie’s results did not agree, someone is wrong, or both are wrong.

1.  (10 points) Complete the following ANOVA table with the correct results. Hint: You do not need to compute anything.

Df

Sum  Sq

Mean Sq

F value

Pr(>F)

dose

___

______

______

_______

_______

method

___

______

______

_______

_______

dose:method

2

33 .5

16 .8

1 .162

0 .322

Residuals

45

648 .7

14 .4

2.  (4 points) Can the method:dose (or dose:method) interaction be dropped from the model? Explain.

3.  (6 points) Using SS notation, show why Mickey’s method sum of squares does not equal Minnie’s method sum of squares.

4.  (4 points) What type of model did the students fit?  Fixed effects, random effects, or mixed model? Explain.

5.  (4 points) What observations, if any, are correlated?

Problem II. Short Answer (36 points total)

Grocery stores are always interested in how well the products on the store shelves sell.  An experiment was designed to test whether the amount of discount given on products affected the amount of sales of that product.  There are three levels of discount ( 5%, 10%, and 15%) and and sales were held for a week. The total number of products sold during the week of the sale was recorded. The researchers also recorded the wholesale price of the items put on sale.

The variables that were collected in the experiment are:

• Discount 5%, 10%, and 15%

• Price: wholesale price (in dollars)

• Sales: number sold during one week

1.  (18 points) I fit several models to the data. Use the information in the Grocery Hand- out to determine your final model.  Examine ALL models, model.0 - model.6, and select the model that best fits the data. Use α = 0.05. Discuss ALL models to justify your selection of your final model.

Your response should be a logical flow of the steps you take to arrive at your final model and what decision you made at each step based on the output. Simply writing down the final model will gain few points if any. Remember your goal is to address the question of interest which is whether the amount of discount given on the products affects the amount of sales of that product.

Summary Information:

> mean(Price)

[1]  8 .524444

>  tapply(Grocery$Price,Grocery$Discount,mean)

5 .00%      10 .00%      15 .00%

8.499167  8.592500  8.481667

>  tapply(Grocery$Sales,Grocery$Discount,mean)

5 .00%      10 .00%      15 .00%

203.5000  217.7500  213.5833

2.  (4 points) Is the ANOVA model a better model than your final ANCOVA model from question 1? Explain.

3.  (4 points) For your final model, is the rate of change in Sales with respect to Price the same for all levels of Discount? Explain.

4.  (10 points) Based on model .3 estimate the covariate adjusted means for two of the three Discounts. Please note, model .3 may or may not be the best final model.

Note: A correct formula and correct plug-ins of observed data will get full credit. No need to perform calculations.

Write on the next page =>

Problem III. Short Answer (34 points total)

The following data comes from an experiment to test the paper brightness depending on shift operators. Interest is not in any particular operator. (Sheldon, 1960).

Sheldon, F. (1960) ”Statistical techniques applied to production situations” .  Industrial  and Engineering  Chemistry, 52, 507-509.

> model .1<-aov(bright~ operator,data=pulp)

>  summary(model .1)

Df  Sum  Sq Mean  Sq  F  value  Pr(>F)

operator         3      1 .34    0 .4467     4 .204  0 .0226  *

Residuals      16      1 .70    0 .1062

1.  (6 points) What statistical assumptions do I need to assess for this model? Be specific.

2.  (4 points) The operator effect is statistically significant. How do I interpret this effect?

3.  (8 points) Estimate the variance of operator.

4.  (4 points) Estimate the variance of brightness.

5.  (4 points) Estimate the intraclass correlation.

6.  (4 points) Interpret the intraclass correlation computed above.

7.  (4 points) What type of model was fit to the data? Fixed effects, random effects, or mixed model? Explain.

Problem IV. Short Answer (6 points total)

A randomized complete block experiment was conducted to investigate a drug added to the feed of chicks to promote growth. There were three levels of drug:

• standard feed (control)

• standard feed plus low dose of drug

• standard feed plus high dose of drug

The following table reports the weight (in pounds) for each chick after 6 weeks. There are 15 chicks in the study.

Drug Dose

Block    Control   Low Dose   High Dose

1

3.93

3.99

3.96

2

3.78

3.96

3.94

3

3.88

3.96

4.02

4

3.93

4.03

4.06

5

3.84

4.10

3.94

1.  (6 points) Complete the Source and df columns of ANOVA table for this dataset:

Source         df

Grocery

> model .0<-lm(Sales~Price)

>  summary(model .0)

Coefficients:

Estimate  Std .  Error  t  value  Pr(>|t |)

(Intercept)  -466 .384          24 .631    -18 .93      <2e-16 ***

Price                  79 .535            2 .886      27 .56      <2e-16 ***

---

Residual  standard  error:  6 .954  on  34  degrees  of  freedom

Multiple  R-squared:    0 .9571,Adjusted  R-squared:    0 .9559

F-statistic:  759 .3  on  1  and  34  DF,    p-value:  < 2 .2e-16

> model .1<-aov(Price~Discount,data=Grocery)

>  summary(model .1)

Df  Sum  Sq  Mean  Sq  F  value  Pr(>F)

Discount          2    0 .085    0 .0426      0 .246    0 .783

Residuals      33    5 .719    0 .1733

>  adjusted .covariate<-resid(model .1)

> model .2<-aov(Sales~Discount,data=Grocery)

>  summary(model .2)

Df  Sum  Sq  Mean  Sq  F  value  Pr(>F)

Discount          2      1288      644 .2      0 .573    0 .569

Residuals      33    37074    1123 .5

> model .3<-lm(Sales~Price + Discount,data=Grocery)

>  summary(model .3)

Coefficients:

Estimate  Std .  Error  t  value  Pr(>|t |)

(Intercept)        -472 .953          18 .317  -25 .820    < 2e-16 ***

Price                       79 .591            2 .148    37 .052    < 2e-16 ***

Discount10 .00%        6 .822            2 .107      3 .238      0 .0028  **

Discount15 .00%      11 .476            2 .098      5 .471  5 .04e-06  ***

---

Residual  standard  error:  5 .137  on  32  degrees  of  freedom

Multiple  R-squared:    0 .978,Adjusted  R-squared:    0 .9759

F-statistic:  473 .9  on  3  and  32  DF,    p-value:  < 2 .2e-16

>  anova(model .3)

Analysis  of  Variance  Table

Response:

Price        Discount  Residuals

---

Sales

Df  Sum  Sq  Mean  Sq

1

2

32

36718

800

844

36718

400

26

F  value        Pr(>F)

1391 .366  < 2 .2e-16 ***

15 .149  2 .348e-05  ***

> model .4<-lm(Sales~Price * Discount,data=Grocery)

>  summary(model .4)

Coefficients:

Estimate  Std .  Error  t  value  Pr(>|t |)

(Intercept)                    -452 .038          27 .668  -16 .338      <2e-16 ***

Price                                   77 .130            3 .251    23 .728      <2e-16 ***

Discount10 .00%                -24 .161          41 .771    -0 .578        0 .567

Discount15 .00%                -39 .308          50 .225    -0 .783        0 .440

Price:Discount10 .00%        3 .632           4 .879      0 .745        0 .462

Price:Discount15 .00%        5 .982            5 .913      1 .012        0 .320

---

Residual  standard  error:  5 .204  on  30  degrees  of  freedom

Multiple  R-squared:    0 .9788,Adjusted  R-squared:    0 .9753

F-statistic:  277 .3  on  5  and  30  DF,    p-value:  < 2 .2e-16

>  anova(model .4)

Analysis  of  Variance  Table

Response:  Sales

Df Price                      1 Discount                2 Price:Discount    2 Residuals            30

---

Sum  Sq 36718 800

32

812

Mean  Sq 36718 400

16

27

F  value 1355 .9419 14 .7636

0 .5926

Pr(>F)

< 2 .2e-16 ***

3 .436e-05  ***

0 .5592

> model .5<-lm(Sales~adjusted .covariate + Discount,data=Grocery)

>  summary(model .5)

Coefficients:

Estimate  Std .  Error  t  value  Pr(>|t |)

(Intercept)                  203 .500            1 .483  137 .225    < 2e-16 ***

adjusted .covariate      79 .591            2 .148    37 .052    < 2e-16 ***

Discount10 .00%              14 .250            2 .097      6 .795  1 .11e-07  ***

Discount15 .00%              10 .083            2 .097      4 .808  3 .47e-05  ***

---

Residual  standard  error:  5 .137  on  32  degrees  of  freedom

Multiple  R-squared:    0 .978,Adjusted  R-squared:    0 .9759

F-statistic:  473 .9  on  3  and  32  DF,    p-value:  < 2 .2e-16

>  anova(model .5)

Analysis  of  Variance  Table

Response:  Sales

Df  Sum  Sq  Mean  Sq  F  value        Pr(>F)

adjusted .covariate    1    36230      36230  1372 .84  < 2 .2e-16 ***

Discount                       2      1288          644      24 .41  3 .648e-07  ***

Residuals                   32        844            26

---

> model .6<-lm(Sales~adjusted .covariate * Discount,data=Grocery)

>  summary(model .6)

Coefficients:

Estimate  Std .  Error  t  value  Pr(>|t |)        (Intercept)                                               203 .500            1 .502  135 .467    < 2e-16 *** adjusted .covariate 77 .130 3 .251 23 .728 < 2e-16 *** Discount10 .00% 14 .250 2 .124 6 .708 1 .97e-07 *** Discount15 .00% 10 .083 2 .124 4 .746 4 .76e-05 *** adjusted .covariate:Discount10 .00% 3 .632 4 .879 0 .745 0 .462 adjusted .covariate:Discount15 .00% 5 .982 5 .913 1 .012 0 .320

---

Residual  standard  error:  5 .204  on  30  degrees  of  freedom

Multiple  R-squared:    0 .9788,Adjusted  R-squared:    0 .9753

F-statistic:  277 .3  on  5  and  30  DF,    p-value:  < 2 .2e-16

>  anova(model .6)

Analysis  of  Variance  Table

Response: Sales

Df

Sum Sq

Mean Sq

F value Pr(>F)

adjusted .covariate

1

36230

36230

1337 .8914  < 2 .2e-16  ***

Discount

2

1288

644

23 .7888  6 .468e-07  ***

adjusted .covariate:Discount

2

32

16

0 .5926        0 .5592

Residuals

30

812

27

---