闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

MAT00042M

MMath and MSc Examinations 2020/21

Advanced Regression Analysis

1 (of 3) . An epidemic has occurred across two countries, country A and country B . In

a study over the three years of the epidemic, each country recorded n = 36 independent monthly observations on the number of cases . These are denoted by y1 , . . . , yn in country A, and by y , . . . , y in country B .

In country A, each outcome random variable Yi , i = 1, . . . , n is assumed to fol- low a Poisson distribution Poi(λi ), where λ 1 , . . . , λn are unknown parameters . Similarly, in country B, each outcome random variable Yi\ , i = 1, . . . , n is as- sumed to follow a Poisson distribution Poi(λ ), where λ , . . . , λ are unknown parameters .

In each country, the study is investigating the association between the number of cases (deﬁned above) and the number of months that passed since the outbreak, denoted by xi = i, with i = 1, 2, . . . , 36 (see Figure 1) .

Country A Country B

data.pdf

Recorded month

Figure 1: Left: Observations for country A (dots) and ﬁtted model modA (solid curve));

Right: Observations for country B (dots) and ﬁtted model modB .3 (solid curve)) . Both models will be introduced in Question 2 .

(a) Recall that in general, the Poisson probability mass function associated to Y ~ Poi(λ) is given by P(Y = y) = λy e_λ /y!, y = 0, 1, 2, . . . . Show that the probability function above belongs to the exponential family of distributions . Using the formulae for the mean and variance of a random variable following the exponential family of distributions, ﬁnd the mean and variance for the Poisson distribution . [10]

(b) For country A, write down the generalised linear model (GLM) modA de- ﬁned using a log-link function and a linear predictor ηi = β0 + β1i, for each month i . Derive the log-likelihood associated to model modA and write down the equations that lead to obtaining the maximum likelihood estimators of the unknown parameters, = (βˆ0 , βˆ1 )T (but do not at- tempt to solve these) . How would you obtain the estimated expected number of cases for month i in country A, once you obtained under this model? [9]

(c) Show that the deviance for model modA at point (b) has the form DmodA = 2 ← (yi log(yi /i )), where µi = E(Yi ) and i denotes its estimator under model modA . [10]

2 (of 3) . The study in Question 1 yielded observations that are plotted in Figure 1 for

both countries A and B .

After almost two years of epidemic, on month 23, the two countries agreed to relax the set of rules to control the disease, and this resulted in a marked increase in the number of cases, evident in both plots . In the dataset, this rule change is recorded by means of an indicator variable, with value 0 for months up to (and including month 22), and value 1 for the months thereafter . In R, this factor covariate is denoted by step .

The following R modelling uses the notation i (a numerical continuous vari- able) for the covariate denoting the number of months that passed since the outbreak . Furthermore, a quadratic covariate i2 is also computed and denoted by i2 . The response is denoted by y for country A, and by yy for country B .

Trimmed R output from a few ﬁtted generalised models appears in what fol- lows, exploring for each country whether the linear predictor should involve the covariate i linearly or as a quadratic, thus also including i2, and whether the relaxation measure quantiﬁed through the covariate step did indeed bear a signiﬁcant eﬀect on the number of cases .

¿ summary(modA)

Call:

glm(formula = y ˜ i, family = poisson(link = ”log”))

Coefficients:

Estimate Std . Error z value Pr(¿—z— ) (Intercept) 0 .528252 0 . 181166 2 .916 0 .00355 ** i 0 .066275 0 .006796 9 .752 ¡ 2e- 16 ***

---

Signif . codes: 0 ‘***’ 0 .001 ‘**’ 0 .01 ‘*’ 0 .05 ‘ . ’ 0 . 1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 154 .911 Residual deviance: 46 .507 AIC: 177 . 12

on 35

on 34

degrees of freedom

¿ summary(modA .3)

Call:

glm(formula = y ˜ i + step, family = poisson(link = ”log”))

Coefficients:

Estimate Std . Error

(Intercept) 0 .81985 0 .20257

i 0 .03444 0 .01297

step 0 .73580 0 .26123

---

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 154 .911 Residual deviance: 38 .357 AIC: 170 .97

on 35

on 33

degrees of freedom

¿ t(matrix(c(1,37,1),nrow=3))%*%vcov(modA .3)%*%matrix(c(1,37,1),nrow=3) [,1]

[1,] 0 .01351348

¿ t(matrix(c(1,37,0),nrow=3))%*%vcov(modA .3)%*%matrix(c(1,37,0),nrow=3) [,1]

[1,] 0 . 1110813

¿ summary(modB .3)

Call:

glm(formula = yy ˜ i + i2 + step, family = poisson(link = ”log”))

Coefficients:

Estimate Std . Error

(Intercept) 2 .8886411 0 . 1040854

i 0 .0632756 0 .0105897

i2 -0 .0006862 0 .0002304

step 0 .7326556 0 .0810130

---

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 1549 . 11 on 35 degrees of freedom

Residual deviance: 374 .54 on 32 degrees of freedom

AIC: 590 .49

¿ 1-pchisq(8 . 15,df=1)

[1] 0 .004306116

¿ 1-pchisq(374 .54,df=32)

[1] 0

¿ 1-pchisq(38 .36,df=33)

[1] 0 .2394046

¿ 1-pchisq(46 .51,df=34)

[1] 0 .07469905

In your analysis below, you may take the signiﬁcance level α = 0.05 if needed, and you may use all reported R output .

In addition to the p-values reported under the R output, you are also given the quantiles z(0.975) s 1.96 , χ1(2)(0.95) s 3.84 and χ4(2)(0.95) s 9.49 .

(a) Deﬁne the AIC and BIC and explain their use . Derive a formula that would allow you to compute the BIC directly from the AIC . Hence, or

otherwise, obtain the BIC for model modA . [6]

(b) Using the R output, assess whether the covariate step is needed in the

model modB .3 for country B . Justify your answer . [5]

(c) Construct an analysis of deviance to compare models modA and modA .3 . Clearly justify your model choice through stating the tested hypotheses, the test statistic and its distribution . Does the factor variable step bear a signiﬁcant eﬀect on the number of cases in country A? [8]

(d) Justify whether model modA .3 is a good ﬁt for the observed data in country A . Using model modA .3, proceed to estimate the number of cases corresponding to month 37, and compute its associated 95% conﬁdence interval . [8]

3 (of 3) . As the number of cases is high in country B, a researcher suggests that the

Poisson distribution used for modelling the response may be replaced with a Gaussian distribution .

The researcher ﬁrst models the independent monthly observations as Yi\ ~ N(µ , (σ\ )2 ), and assumes a common, unknown variance (σ\ )2 across all i = 1, . . . , 36 months . Following some residual checks, the researcher then decides to model Yi\\ = log(Yi\ ) ~ N(µ\ , (σ\\ )2 ) across the recorded i = 1, . . . , 36 months .

Trimmed R output along with some residual check plots (Figure 2) from the researcher’s investigation are reported below .

¿ summary(gmodB . 1)

Call:

glm(formula = yy ˜ i, family = gaussian(link = ”log”))

Coefficients:

Estimate Std . Error t value Pr(¿—t— ) (Intercept) 2 .932328 0 .283432 10 .346 4 .85e- 12 *** i 0 .062547 0 .009531 6 .562 1 .62e-07 ***

---

Signif . codes: 0 ‘***’ 0 .001 ‘**’ 0 .01 ‘*’ 0 .05 ‘ . ’ 0 . 1 ‘ ’ 1

Null deviance: 119075 Residual deviance: 40638 AIC: 361 .21

on 35

on 34

degrees of freedom

¿ yy2¡-log(yy)

¿ lgmodB . 1¡-glm(yy2˜i,gaussian(link=’identity’))

¿ summary(lgmodB . 1)

Call:

glm(formula = yy2 ˜ i, family = gaussian(link = ”identity”))

Coefficients:

Estimate Std . Error t value Pr(¿—t— ) (Intercept) 2 .598131 0 . 166012 15 .650 ¡ 2e- 16 *** i 0 .072199 0 .007824 9 .227 8 .77e- 11 ***

---

Signif . codes: 0 ‘***’ 0 .001 ‘**’ 0 .01 ‘*’ 0 .05 ‘ . ’ 0 . 1 ‘ ’ 1

(Dispersion parameter for gaussian family taken to be 0 .2378461)

Null deviance: 28 .3383 Residual deviance: 8 .0868 AIC: 54 .405

on 35

on 34

degrees of freedom

The researcher then went on to implement a local linear estimator to describe the association between the transformed monthly number of cases, Yi\\ , and the covariate denoting the number of months that passed since the outbreak, xi = i . The estimated function appears in Figure 3 (top), each curve corresponding to a diﬀerent choice of bandwidth .

Before comparing the parametric and nonparametric estimates, the researcher also wants to check using nonparametric density estimation that the use of a Gaussian distribution in model lgmodB . 1 is indeed justiﬁed . (S)he implements a kernel-based density estimator for the the deviance residuals associated to model lgmodB . 1, using the Epanechnikov kernel with optimal bandwidth . The

histogram and resulting estimate appear in Figure 3 (bottom) . You are also given that the variance of the deviance residuals is 0.23 .

Residuals vs Fitted

3.0 3.5 4.0 4.5 5.0

Predicted values

Normal Q−Q

−2 −1 0 1 2

Theoretical Quantiles

Residuals

Residuals vs Fitted

3.0 3.5 4.0 4.5 5.0

Std. deviance resid.

−2 −1 0 1 2

Predicted values

plots.pdf

Figure 2: Top: Residual checks for model gmodB .1; lgmodB .1 .

Normal Q−Q

−2 −1 0 1 2

Theoretical Quantiles

Bottom: Residual checks for model

0 5 10 15 20 25 30 35

Recorded month

plots.pdf

−0.5

Deviance residuals

Figure 3: Top: Nonparametric regression estimates (red and blue curves) connecting the log of the number of cases and the number of months since the outbreak started, ob- tained using two diﬀerent bandwidths; Bottom: Density estimate (red curve) for deviance residuals of model lgmodB .1, with superimposed theoretical normal density (blue curve) .

(a) Write down the estimated models gmodB . 1 and lgmodB . 1, making sure you specify their respective response distribution and link function, while also stating the precise relationship between their respective dispersion parameter and variance . Formulate the deviance-based estimator of the dispersion parameter and obtain the estimated variance under model gmodB . 1 . [10]