关键词 > MTH6134/MTH6134P

MTH6134 / MTH6134P: Statistical Modelling II Main Examination period 2021

发布时间:2023-12-29

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Main Examination period 2021 – January – Semester A

MTH6134/MTH6134P: Statistical Modelling II

Question 1  [23 marks].     Suppose that Yi        N(μi ; σ2 ) for i = 1; 2; : : : ; n, all independent, where μi = β1xi+β2zi , xi and zi are known covariates, and σ is known.

(a) Write down the likelihood for the data y1,...,yn.                 [6]

(b)  Find the maximum likelihood estimatorsβˆ1  andβˆ2  of β1 and β2 .                        [12]           

(c)  Prove thatβˆ1  is an unbiased estimator of β1 .                             [4]  

(d)  Explain whyβˆ1 has a normal distribution.                            [1]                           

Question 2  [20 marks].     The numbers of babies surviving to discharge from a hospital (y) out of the number admitted to neonatal intensive care (r) for two epochs (w) and three gestational ages (x), in weeks, were recorded. Below are the data.

x

23

23

24

24

25

25

w

1

2

1

2

1

2

r

81

65

165

198

229

225

y

15

12

40

82

119

142

Let Yjk  denote the number of babies surviving to discharge out of the rjk  of gestational age xk admitted to neonatal intensive care for epoch j. Then it is assumed that Yjk ~ Bin(rjk ; πjk) for j = 1; 2 and

k = 1; 2; 3, all independent, where logfπjk/(1 - πjk)g = ajjxk. This model was tted to the data using R and the following output was obtained:

Call:

glm(formula  =  p  ~  w  +  w:x,  family  =  binomial,  weights  =  r)

Deviance  Residuals:

1               2               3               4               5               6

1.1557    -0.3945    -1.3118      0.3665      0.4957    -0.1753

Coefficients:

Estimate  Std .  Error  z  value  Pr(>|z|)


(Intercept)         -22.9574         3.6704      -6.255     3.98e-10  ***

w2                     -0.5081          5.1116      -0.099          0.921

w1:x                   0.9188          0.1499      6.128       8.88e-10  ***

w2:x                   0.9611          0.1459      6.587       4.47e-11  ***

---


Signif .  codes:    0  ***  0.001  **  0.01  *  0.05  .  0.1      1

(Dispersion  parameter  for  binomial  family  taken  to  be  1)


Null  deviance:  109.1191     on  5    degrees  of  freedom

Residual  deviance: 3.6228   on  2    degrees  of  freedom

AIC:  42.76



Number  of  Fisher  Scoring  iterations:  4 


(a) Plot the proportions of babies surviving to discharge against gestational age by epoch. What are your conclusions? [6]

(b) Write down the fitted logistic regression model for each epoch. [6]

(c) Use the above output to assess the goodness of fit of the model. [4]

(d) Give an approximate 95% confidence interval for β1 −β2. [4]


Question 3  [22 marks].     Suppose that Yi         Bin(ri ; πi) for i = 1; 2; : : : ; n, all independent, where the ri are known, logf-log(1 - πi)g = β0+β1xi and xi is a known covariate.



(a) Explain why this is a generalised linear model.         [4]


(b) Find the Fisher information matrix.                                                                    [8]

(c)  Obtain the asymptotic distribution of the maximum likelihood estimatorβˆ1  of β1 .              [8]

(d)  State the form of an approximate test for testing H0: β1  = 0 against H1: β1  0.                   [2]

Question 4  [23 marks].     A study of 49 attending physicians and 71 surgical residents in training at a university hospital was carried out to investigate whether the two groups of surgeons were applying unnecessary blood transfusions at different rates. For each surgeon, the number of blood transfusions  prescribed unnecessarily in one year was recorded. The contingency table below summarises the data.

 

Surgeon

Unnecessary Blood Transfusion

 

Total

Frequent

Occasionally

Rarely

Never

Attending

2

3

31

13

49

Resident

15

28

23

5

71

Let Yjk  denote the number of surgeons classified in row j and column k. Then it is assumed that the Yjk for row j have a multinomial distribution with parameters yj:  and θjk  for j = 1; 2 and k = 1; 2; 3; 4, and that the rows are independent, where yj: = Σk(4)= 1yjk and θjk is the probability that a surgeon is classified in row j and column k. The null hypothesis is that the distributions of unnecessary blood transfusions are the same for the two groups of surgeons.


(a) Briefly explain how you would enter these data into R. What commands would you use to fit a log-linear model to the data?        [4]

(b) Explain why, under the null hypothesis, the expected frequency for cell (j k) is e jk  yj:y:k/nwhere n = 120.                                  [4]

(c)  Obtain the expected values under the null hypothesis. Compare these with the observed values.    [5]


(d)  Find the deviance and the value of Pearson’s goodness-of-fit test statistic. What is your conclusion about the numbers of unnecessary blood transfusions for the two groups of surgeons?  [10]

Question 5  [12 marks].     Suppose that Ti        Exp(λi) for i = 1; 2; : : : ; n, all independent, where λi = β xi  and xi is a known covariate.

(a)  Explain what link is being used here.                        [1]

(b)  Write down the likelihood for the data (ti ; δi) for i = 1; 2; : : : ; n, where δi  is a censoring variable.                  [4]

(c)  Show that the maximum likelihood estimator of β isβ(ˆ) = Σ1                         [5]

(d)  Now assume that there is no censoring. Given that the vectors tand x in R contain the times and the covariate values, what commands would you use to obtain the details of the fitted model?                      [2]