Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STAT5002:  Introduction to Statistics - Semester 1, 2022

1. Assume that the marks in the following subjects are normally distributed:

Subject

Mean (µ)

Standard deviation (σ)

Statistics

50

12

Economics

65

10

Mathematics

76

8

(a) Douglas obtained a final mark of 68 in Statistics, 73 in Economics, and 71 in Mathematics. In which subject did he perform best compared to the rest of the class?

(b) Maria’s z scores were 1.2 in Statistics, -0.5 in Economics, and -1.5 in Mathematics. Calculate Maria’s mark in the subject where she performed worst compared to the rest.

(c) Examiners often use z scores to scale marks via a new mean and a new standard deviation. The new marks are then directly comparable. Calculate the scaled marks with a new mean of

100 and a new standard deviation of 20 in each subject for Douglas and Maria.

(d) Refer to part (c). Who had the best overall performance?

2. It has long been known that brain weight scales with body weight across large groups of animals. The data were collected on n = 24 mammals and is found in the Q2 sheet in the Excel spreadsheet. Let X be the body weights (kg) and Y be the brain weights (g).

(a) Produce a scatter plot of Brain weights (g)”versus ”Body weights (kg)”. Make sure you label your axes properly and that your graph has an appropriate title.  Briefly describe the nature of the relationship between these two variables. Are there any outliers? If yes, can we remove them? Why or why not?

(b) You would like to build a linear regression model to predict brain weights (g) using body weight (kg). Which model: linear-linear, log-linear, linear-log, or log-log fits the data better? Provide visual evidence to support your argument. Write down the model of your choice.

3. The dataset Q3 contains the following information on a sample of n = 36 severely depressed indi- viduals.

Variable

Description

Eff

Measure of the effectiveness of the treatment

Age

Age (years)

Tmt

Treatment received (A, B or C)

(a) Produce a scatter plot of Eff versus Age. What does it show?

(b) Run a regression of Eff on Age. Write down the fitted regression equation.

(c) Produce another scatter plot of Eff versus Age but this time with colour coding and different regression lines for each of the three treatments.  Does the treatment appear to interact with age in explaining the response? Explain why or why not.

(d) Code up dummy variables for treatments A and B as well as an interaction between Age and each of treatments A and B. Attach the R code to show how you create the dummies and interaction terms. Why don’t we need a dummy variable for treatment C?

(e) Using Age, the dummies, and the interactions as predictors, perform the backward elimination to obtain the best model by means of AIC criterion. Write down the final estimated regression equation. What percentage of the total variation in Eff is explained by the model?

(f) Use the partial F test to determine which model [the one in part (b) or the one in part (e)] fits the data better. Include mention of H0  and H1 , the observed value of the test statistic, the p-value, the decision, and conclusion.

(g) Predict the effectiveness of treatments for the following people:

Patient

Age

Treatment

Peter

20

A

Anna

56

B

Louis

69

C

4. As part of the 2020 College Alcohol Study, students who drank alcohol in their senior year were asked if drinking ever resulted in missing a class. The data are given in the following table:

 

Drinking Status

 

Missed a class

Nonbinger

Occasional binger

Frequent binger

Total

No

41

18

11

70

Yes

4

8

18

30

Total

45

26

29

100

(a) At the 0.05 level of significance, is there evidence of a significant association between missing a class and drinking status?  Include mention of H0  and H1 , the observed value of the test

statistic, the p-value, the decision, and conclusion.

(b) What is the conditional distribution of drinking status?

(c) What are the odds of a nonbinger who missed a class?

(d) What are the odds of a frequent binger who missed a class?

(e) What is the odds ratio for nonbingers versus frequent bingers who missed a class?

(f) Fit a logistic regression of a senior student who never missed a class on drinking status. Treat frequent binger as a base group. Write down the fitted regression equation.

(g) Refer to part (f). What is the odds ratio for nonbingers versus frequent bingers who missed a class? Is it the same as your calculation in part (e)? Explain why or why not.