关键词 > MTHM506/COMM511

MTHM506/COMM511: Statistical Data Modelling

发布时间：2023-02-11

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

MTHM506/COMM511: Statistical Data Modelling

Assessment - Individual Exercises

Marks achieved in this assignment will contribute towards 50% of the ﬁnal module mark. You should attempt all questions on this sheet. Note that the questions are organised in the order we covered the topics, and not in order of diﬃculty. Therefore it is advised that you read through the questions ﬁrst, and start working on those that you feel more comfortable with.

Deadline: Noon (12pm), on 3rd March 2023

You should submit one pdf via eBART containing your solutions - it should be written up using word processing software (e.g. LaTeX, R Markdown, or Word). Solutions are expected to be concise, well structured and well presented. Commented R code (e.g. ‘model <- glm( . . .)’) and the outcomes/plots should form part of your solutions. Do not display too much raw R output (e.g. don’t display the full output of ‘ summary(model)’), but edit this down to the essentials. Ensure to include justiﬁcation for each step of your analyses, providing comments alongside your R code to explain what you are doing and add appropriate titles and labelled axes to your plots. Hand written solutions will be accepted where mathematical descriptions are required, but a professional word processed submission is preferred.

You are expected to work independently - strict disciplinary action will be taken for any plagiarism. Late

submissions will also be penalised according the University’s late submission policy.

The data required for this assignment datasets_exercises .RData can be downloaded from the ELE page and loaded into R using the load() function.

Question 1

The data frame nlmodel contains data on a response variable y and a single explanatory variable x. A scatter plot of y versus x suggests a strong non-linear relationship:

200

150

100

0.00 0.25 0.50 0.75 1.00

Suppose for these data we wish to consider the model

Yi ~ N ~ , 口2 、

i = 1, 2, . . . , 100, Yi independent

(a) [ 1 mark] Why can’t this model be ﬁt using a linear (regression) model?

(b) [2 marks] Write down the likelihood L(91 , 92 , 口2 ; y , a) and the log-likelihood y(91 , 92 , 口2 ; y , a).

(c) [ 1 mark] Write an R function mylike() which evaluates the negative log-likelihood (i.e. -y(91 , 92 , 口; y , a)) for any values of the three parameters.

(d) [3 marks] Use the R function nlm() in association with your function mylike() to numerically minimise the log-likelihood and report the maximum likelihood estimates for the model parameters. Provide some evidence of how you chose sensible starting values.

(e) [2 marks] Estimate the standard errors and construct 99% conﬁdence intervals for 91 and 92 .

(f) [2 marks] Test the hypothesis that 92 = 0.08 at the 10% signiﬁcance level (not using the conﬁdence interval).

(g) [4 marks] Produce a plot of the associated mean relationship and the associated 95% prediction intervals on a scatter plot of y versus x. Comment on the appropriateness of the model.

Question 2

The dataframe aids data relates to the number of quarterly AIDS cases in the UK, yi , from January 1983 to March 1994. The variable cases is yi and date is time, symbolised here as zi . A scatter plot of yi versus zi shows an increasing trend in cases:

500

400

300

200

100

82.5 85.0 87.5 90.0 92.5

Date

In this question we consider two competing models to describe the trend in the number of cases. Model 1 is

Yi ~ Pois(Ai )

log(Ai ) = β0 + β1 zi

and Model 2 is

Yi ~ N(ui , 口2 )

log(ui ) = y0 + y1 zi

(a) [2 marks] Comment on whether the proposed models are sensible in terms of the distribution and the

(b) r[k(n)he(x)t(w)edels(me)i and i ) on top of

the data with approximate 95% conﬁdence intervals around the mean. Comment on the validity of each model (based on the plot).

(d) [2 marks] Produce the deviance residuals vs ﬁtted values (i and i ) plot for each model, comment appropriately and thus propose a way that the two models might be extended to improve the ﬁt.

(e) [4 marks] Implement the proposed extensions to each model, to arrive at a ﬁnal version for each of them (justiﬁed by appropriate hypothesis tests).

(f) [8 marks] On the basis of your answers (a)-(d), but also on arguments of model ﬁt based on the deviance, comment on which (if any) of the two ﬁnal models in (e) you would choose as the best. Mention at least one reason why either model is not ideal.

(g) [4 marks] Further extend your ﬁnal Poisson model to a Negative Binomial model and comment on whether this model is preferable to the other two, on the basis of all the criteria used for comparison so far.