MED5366 Stats 2 – Exam Answers 2020-2021
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
MED5366 Stats 2 – Exam Answers 2020-2021
1)
a) Why is it better to use adjusted R2 compared to R2 for model selection in multiple regression?
[2 marks ]
R-sq increases as more independent variables are added to a model, regardless of how useful or useless these variables are at explaining the variability in the response [1]. Adj R- sq helps to overcome this issue by reducing R-sq relative to the number of independent variables in the model [1].
b) Why is the predicted R2 (“R-sq(pred)” in Minitab) also important when assessing multiple regression models?
[2 marks ]
When the sample size is small relative to the complexity of the model, the model can overfit the data [1/2]. The model can be over-optimistic and may give poor predictions for new data [1/2]. Minitab provides “R-sq(pred)” that calculates the predicted R-squared using leave-one-out cross-validation [1/2]; if this is much lower than adjusted R-sq then the model is likely to be overfitted [1/2].
2)
a) Define the term numbers needed to treat (NNT)?
[1 mark ]
The number of individuals we need to treat with the new treatment in order to achieve one more positive result than if we had treated them with the old treatment [1]. [0.5 just for formula]
b) What are numbers needed to treat called when they take a negative value?
[1 mark ]
The number needed to harm (NNH) [1].
c) Why does care need to be taken when interpreting the confidence interval for the numbers needed to treat, when there is no statistically significant difference between the new and existing treatments?
[3 marks ]
The confidence interval will have limits above and below zero [1]. However, as NNT and NNH nearer zero indicate a greater difference between the groups [1/2], since the NNT is an inversion of a difference in proportions [1/2], the confidence interval does not join the two points via zero, but goes via + and – infinity [1]. This makes it hard to interpret [1/2]. Note also that values between - 1 and 1 are not possible [1/2]. [Max 3 available]
3)
a) What is the main assumption of the Wilcoxon Signed Ranks Test (other than that the data are an independent random sample and can be uniquely ranked)?
[1 mark ]
The test assumes observations come from a population with a symmetric distribution [1].
b) Why does this assumption limit the utility of the Wilcoxon Signed Ranks Test?
[3 marks ]
The Wilcoxon Signed Ranks Test is a non-parametric test [0.5]; as a non-parametric test it is preferable to a t-test when the data are not normally distributed [0.5]. However, symmetry is a main feature of the normal distribution [0.5], so if the data are not normally distributed, they may well not be symmetrical [0.5] and so also not suitable for analysis with the Wilcoxon test [1].
c) If the data are not suitable for analysis with either a one-sample t-test or Wilcoxon Signed Ranks Test, what alternative test can be used? What is the limitation of this alternative test?
[2 marks ]
Alternative test: the sign test [1]; it has much lower power than the Wilcoxon test [1].
2022-05-09