Stats 101A Practice Exam 2
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Stats 101A Practice Exam 2
2023-03-10
Disclaimer
The following exam is not a伍liated with professor Cha, nor is it a guarantee of what content will show up on the midterm. All of these ideas were created by yours truly, Joshua Lim, with inspiration from the material covered in stats 101A and the generation 8 pokemon dataset.
section A: Multiple choice
Questions 1 and 2 refer to the following scenario:
suppose we have a population model
Y = β0 + β1 X1 + . . . + βPXP
and a reduced model
Y = β0 + β1 X1 + . . . + βKXK
where k < p.
1. which of the following best describes the appropriate null and alternative hypotheses for the overall F - test about the population model?
a. H0 :Y = β0 + β1 X1 + . . . + βP- 1 XP- 1 , HA :Y = β0 + β1 X1 + . . . + βPXP
b. H0 :β0 , . . . , βP = O, HA :βi O for all i = O, . . . , p
c. H0 :β0 , . . . , βP = O, HA :βi O for at least one i = O, . . . , p
d. H0 :β0 , . . . , βP = O, HA :βi βi , i j
2. which of the following best describes the appropriate null and alternative hypotheses for the partial F - test comparing the two models?
a. H0 :β0 , . . . , βk = O, HA :not all of those are O.
b. H0 :βK+1, . . . , βp = O, HA :not all of those are O.
c. H0 :Y = β0 + β1 X1 + . . . + βKXK , HA :Y = β0 + β1 X1 + . . . + βPXP
d. B and C
3. which of the following describes the diference between R2 and Ra(2)dj
a. R2 improves with each new, unique predictor added, while Ra(2)dj imposes a complexity penalty to
prevent overitting.
b. R2 is better used for simple linear regression while Ra(2)dj is better used for multiple linear regression. c. R2 is better used to compare models with diferent numbers of predictors while Ra(2)dj is better used
to compare models with the same number of predictors.
d. A and B
Questions 4 - 6 refer to the following scenario:
suppose professor Juniper is trying to see the efects of a pokemon,s speed and whether the pokemon is legendary or not on base stat total. Let Y represent a pokemon,s base stat total, x represent a pokemon,s speed, and L the dummy variable of a legendary pokemon or not. she wants to it the following ANCOVA models:
scenario 1: Y = β0 + β1 x + e
scenario 2: Y = β0 + β1 x + β2 L + e
scenario 3: Y = β0 + β1 x + β3 (x * L) + e
scenario 4: Y = β0 + β1 x + β2 L + β3 (x * L) + e
Legendary Not Legendary |
|
|
100
speed
4. she would like to conduct several partial F-tests to identify the best model. Based on the graph above, which of the following comparisons should she not bother making
a. scenario 1 vs scenario 3
b. scenario 1 vs scenario 4
c. scenario 2 vs scenario 4
d. scenario 3 vs scenario 4
she generates the following output below
##
## call:
## lm(formula = total-point网 ~ 网peed * i网-legendary, data = pokemon)
##
## Re网idual网:
## Min 1Q Median 3Q Max
## -334.77 -64.88 -5.78 67.40 461.62
##
## coefficient网:
## E网timate std . Error t value pr(>ltl)
## (Ⅰntercept) 297.1230 7.6022 39.084 < 2e-16 ***
## 网peed 1.8185 0.1075 16.917 < 2e-16 ***
## i网-legendaryTRUE 186.4791 32.4079 5.754 1.15e-08 ***
## 网peed:i网-legendaryTRUE -0.4356 0.3285 -1.326 0.185
## ---
## signif . code网: 0 ' *** ' 0.001 ' ** ' 0.01 ' * ' 0.05 ' . ' 0.1 ' ' 1
##
## Re网idual 网tandard error: 91.1 on 1024 degree网 of freedom
## Multiple R-网quared: 0.4409, Ad二u网ted R-网quared: 0.4393
## F-网tati网tic: 269.2 on 3 and 1024 DF, p-value: < 2.2e-16
5. Based on our model, for every 1 increase in speed, a non legendary pokemon should see what increase in base stat total on average?
a. -186
b. 186
c. 1.82
d. 1.3829
6. Based on our model, a legendary pokemon with a speed stat of 1oo is predicted to have what base stat total?
a. 435
b. 478
c. 621
d. 666
Consider the following output
## bcpower Transformations to Multinormality
## Est power Rounded pwr wald Lwr Bnd wald upr Bnd
## base-experience 0.3431 0.33 0.2645 0.4218
## catch-rate 0.4046 0.40 0.3610 0.4482
## defense 0.2850 0.33 0.1993 0.3706
## total-points 0.6236 0.62 0.5123 0.7349
##
## Likelihood ratio test that transformation parameters are equal to 0
## (all log transformations)
## LRT df pval
## LR test, lambda = (0 0 0 0) 601.1704 4 < 2.22e-16
##
## Likelihood ratio test that no transformations are needed
## LRT df pval
## LR test, lambda = (1 1 1 1) 1047.89 4 < 2.22e-16
7. which statement best describes the following output?
a. we could transform all of the variables to either the estimated or rounded power.
b. Because both p-values are small, we should log all of the variables or keep all of the variables untransformed.
c. Because both p-values are small, we should not log all of the variables nor keep all of the variables untransformed.
d. A and C
8. Consequences of over itting include everything except
a. A meaninglessly high R2 value.
b. More outliers.
c. poor predictive ability.
d. A greater chance for multicollinearity.
9. which of the following best describes the AIC and BIC?
a. The AIC and BIC are both measures of goodness of it, but the BIC generally has a greater penalty for predictors.
b. The AIC and BIC are both measures of goodness of it, but the AIC generally has a greater penalty for predictors.
c. Like the R2 value, the better it a model is the higher the AIC and BIC.
d. The higher variance inlation factors are, the smaller the AIC and BIC.
1o. which of the following is still useful for binary logistic regression
a. G2
b. standardized pearson Residuals
c. standardized Deivance Residuals
d. GH(2)A - GH(2)o
section B: Free Response
Question 1: professor Elm,s Experience points
Experience points, or EXp for short, capture how far a pokemon has progressed in its battling journey. Defeating other pokemon grants experience points. professor Elm of the Johto region is investigating what factors contribute to the number of experience points a pokemon grants upon defeating it. He irst its a multiple linear regression model with base—experience as the response variable and catch rate, defense, base stat total, and weight as predictors.
##
## call:
## lm(formula = ba网e-experience ~ catch-rate + defen网e + total-point网 +
## weight-kg, data = gen-12)
##
## Re网idual网:
## Min 1Q Median 3Q Max
## -92.78 -18.36 -1.23 17.60 362.98
##
## coefficient网:
## E网timate std . Error t value pr(>ltl)
## (Ⅰntercept) -92.91176 15.19103 -6.116 3.13e-09 ***
## catch-rate -0.04213 0.04172 -1.010 0.313
## defen网e -0.37670 0.08405 -4.482 1.07e-05 ***
## total-point网 0.63396 0.03099 20.458 < 2e-16 ***
## weight-kg 0.01335 0.03320 0.402 0.688
## ---
## signif . code网: 0 ' *** ' 0.001 ' ** ' 0.01 ' * ' 0.05 ' . ' 0.1 ' ' 1
##
## Re网idual 网tandard error: 35.58 on 287 degree网 of freedom
## (7 ob网ervation网 deleted due to mi网网ingne网网)
## Multiple R-网quared: 0.7962, Ad二u网ted R-网quared: 0.7933
## F-网tati网tic: 280.2 on 4 and 287 DF, p-value: < 2.2e-16
a. Interpret the coe伍cient for defense and construct a 95% conidence interval for its slope. Does your interval agree with the p-value?
b. perform an overall F-test. what does it say about the it of the model?
c. professor Elm truly believes that the slope for defense should be positive. Briely describe why some of the slopes in his regression model may have lipped sign.
professor Elm produces the following diagnostics:
0 50 100 150 200 250
catch Rate
0 50 100 150 200
Defense
200 400 600 800
Base stat TotaI
ResiduaIs vs Fitted
0 100 200 300
Fitted vaIues
0 200 400 600
weight
NormaI Q-Q
|
289 146
|
-3 -2 -1 0 1 2 3
TheoreticaI QuantiIes
scaIe-Location
289 146 158 |
0 100 200 300
Fitted vaIues
ResiduaIs vs Leverage
289 146 257 Cook's distance |
0.00 0.10 0.20 0.30
Leverage
d. comment on any weaknesses in the model.
e. professor Elm decides he wants to transform the variables using the box-cox method and generates the following ouput:
## bcpower Transformations to Multinormality
## Est power Rounded pwr wald Lwr Bnd wald upr Bnd
## base-experience 0.1258 0.00 –0.0194 0.2709
## catch-rate 0.3620 0.33 0.2760 0.4479
## defense 0.3751 0.50 0.2370 0.5133
## total-points 0.4804 0.50 0.2833 0.6774
## weight-kg 0.1604 0.16 0.1042 0.2166
##
## Likelihood ratio test that transformation parameters are equal to 0
## (all log transformations)
## LRT df pval
## LR test, lambda = (0 0 0 0 0) 180.5927 5 < 2.22e–16
##
## Likelihood ratio test that no transformations are needed
## LRT df pval
## LR test, lambda = (1 1 1 1 1) 967.3825 5 < 2.22e–16
i. comment on why professor Elm may opt to use log transformations for some of the variables instead of the rounded powers.
ii. write down a new itted model based on the transformation.
He transforms several of the variables and generates a new model and diagnostic plots:
##
## call:
## lm(formula = tbe ~ tcr + tdef + ttp + twk, data = gen– 12)
##
## Re网idual网:
## Min 1Q Median 3Q Max
## -0.74243 -0.13473 -0.01231 0.15290 0.86350
##
## coefficient网:
## E网timate std . Error t value pr(>ltl)
## (Ⅰntercept) 1.417108 0.205845 6.884 3.66e-11 ***
## tcr -0.039600 0.017509 -2.262 0.02446 *
## tdef -0.029225 0.008859 -3.299 0.00109 **
## ttp 0.183025 0.008448 21.665 < 2e-16 ***
## twk 0.048928 0.049792 0.983 0.32661
## ---
## signif . code网: 0 ' *** ' 0.001 ' ** ' 0.01 ' * ' 0.05 ' . ' 0.1 ' ' 1
##
## Re网idual 网tandard error: 0.2154 on 287 degree网 of freedom
## (7 ob网ervation网 deleted due to mi网网ingne网网)
## Multiple R-网quared: 0.8646, Ad二u网ted R-网quared: 0.8627
## F-网tati网tic: 458.1 on 4 and 287 DF, p-value: < 2.2e-16
2 3 4 5 6
catch Rate
2 4 6 8 10 12 14
Defense
15 20 25
Base stat TotaI
1.0 |
1.5 |
2.0 |
1 2.5 |
weight
ResiduaIs vs Fitted
3.5 4.0 4.5 5.0 5.5 6.0
Fitted vaIues
NormaI Q-Q
146289
171 |
-3 -2 -1 0 1 2 3
TheoreticaI QuantiIes
2023-08-05