STAT5002: Introduction to Statistics - Semester 1, 2022
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
STAT5002: Introduction to Statistics - Semester 1, 2022
1. Assume that the marks in the following subjects are normally distributed:
Subject |
Mean (µ) |
Standard deviation (σ) |
Statistics |
50 |
12 |
Economics |
65 |
10 |
Mathematics |
76 |
8 |
(a) Douglas obtained a final mark of 68 in Statistics, 73 in Economics, and 71 in Mathematics. In which subject did he perform best compared to the rest of the class?
(b) Maria’s z scores were 1.2 in Statistics, -0.5 in Economics, and -1.5 in Mathematics. Calculate Maria’s mark in the subject where she performed worst compared to the rest.
(c) Examiners often use z scores to scale marks via a new mean and a new standard deviation. The new marks are then directly comparable. Calculate the scaled marks with a new mean of
100 and a new standard deviation of 20 in each subject for Douglas and Maria.
(d) Refer to part (c). Who had the best overall performance?
2. It has long been known that brain weight scales with body weight across large groups of animals. The data were collected on n = 24 mammals and is found in the Q2 sheet in the Excel spreadsheet. Let X be the body weights (kg) and Y be the brain weights (g).
(a) Produce a scatter plot of ”Brain weights (g)”versus ”Body weights (kg)”. Make sure you label your axes properly and that your graph has an appropriate title. Briefly describe the nature of the relationship between these two variables. Are there any outliers? If yes, can we remove them? Why or why not?
(b) You would like to build a linear regression model to predict brain weights (g) using body weight (kg). Which model: linear-linear, log-linear, linear-log, or log-log fits the data better? Provide visual evidence to support your argument. Write down the model of your choice.
3. The dataset Q3 contains the following information on a sample of n = 36 severely depressed indi- viduals.
Variable |
Description |
Eff |
Measure of the effectiveness of the treatment |
Age |
Age (years) |
Tmt |
Treatment received (A, B or C) |
(a) Produce a scatter plot of Eff versus Age. What does it show?
(b) Run a regression of Eff on Age. Write down the fitted regression equation.
(c) Produce another scatter plot of Eff versus Age but this time with colour coding and different regression lines for each of the three treatments. Does the treatment appear to interact with age in explaining the response? Explain why or why not.
(d) Code up dummy variables for treatments A and B as well as an interaction between Age and each of treatments A and B. Attach the R code to show how you create the dummies and interaction terms. Why don’t we need a dummy variable for treatment C?
(e) Using Age, the dummies, and the interactions as predictors, perform the backward elimination to obtain the best model by means of AIC criterion. Write down the final estimated regression equation. What percentage of the total variation in Eff is explained by the model?
(f) Use the partial F test to determine which model [the one in part (b) or the one in part (e)] fits the data better. Include mention of H0 and H1 , the observed value of the test statistic, the p-value, the decision, and conclusion.
(g) Predict the effectiveness of treatments for the following people:
Patient |
Age |
Treatment |
Peter |
20 |
A |
Anna |
56 |
B |
Louis |
69 |
C |
4. As part of the 2020 College Alcohol Study, students who drank alcohol in their senior year were asked if drinking ever resulted in missing a class. The data are given in the following table:
|
Drinking Status |
|
||
Missed a class |
Nonbinger |
Occasional binger |
Frequent binger |
Total |
No |
41 |
18 |
11 |
70 |
Yes |
4 |
8 |
18 |
30 |
Total |
45 |
26 |
29 |
100 |
(a) At the 0.05 level of significance, is there evidence of a significant association between missing a class and drinking status? Include mention of H0 and H1 , the observed value of the test
statistic, the p-value, the decision, and conclusion.
(b) What is the conditional distribution of drinking status?
(c) What are the odds of a nonbinger who missed a class?
(d) What are the odds of a frequent binger who missed a class?
(e) What is the odds ratio for nonbingers versus frequent bingers who missed a class?
(f) Fit a logistic regression of a senior student who never missed a class on drinking status. Treat frequent binger as a base group. Write down the fitted regression equation.
(g) Refer to part (f). What is the odds ratio for nonbingers versus frequent bingers who missed a class? Is it the same as your calculation in part (e)? Explain why or why not.
2022-05-24