Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

MATH5885 Longitudinal Data Analysis

Term 2 2022

Assignment 2

1. Refer to the VV study, which we considered in Lecture 2 of Week 4.  Provide a report in which you summarize your conclusions from the following analyses:

(a) Produce two ”spaghetti plots”of all cases in the Smoker and Ex-smoker groups separately, with the mean response for the group overlaid. Plot these side-by-side by making sure that you use a common vertical scale for both plots to allow visual comparison of responses from both groups and that you set the observations in the correct time order . Summarize what you observe about mean levels, variability and possible random effects or other serial dependence in these plots. Give sufficient details to your answer.

(b) Review the analysis presented in lectures and run the code for the linear model with a general covariance structure in the GLS fit using REML. Comment on the fitted model, considering both the regression parameters and estimated covariance matrix.  Also, try centering the time variable at 12 years (i.e. use (time − 12) as a new time) and refit. Report on possible differences in the fits, more specifically on the intercept estimate and explain them.

(c) Revert to the original time.  Similarly to the quadratic trend model that we fit during the lectures, fit a model that also contains cubic terms in the time.

i) Compare to cubic fit to the linear model in time, report the G2  statistic (applying

ML estimation) and report your decision.  Explain the degrees of freedom for the used G2 statistic.

ii) Also perform the (incorrect) anova comparison of the same linear and cubic models when using REML and explain any discrepancy in the outcome.

(d) Look at the estimate of the unstructured covariance matrix and explain why a compound symmetry looks as a reasonable simplification. Using the linear in time model for the fixed effects, compare these two covariance models using the likelihood ratio test (after fitting the models using REML).

i) Report the p-value and your decision and explain how the 26 = 32 − 6 degrees of freedom are obtained.

ii) Explain why there is no need to perform a parametric bootstrap here and the results of the likelihood test are good enough for a conclusion to be made.

(e) Create a model with a change in trend at year = 6.  This model should be specified so that the linear model of lectures is a special case (i.e., it is nested within this model).  Fit this model and compare your results with the original time model in part (b).  Specify the likelihood ratio test of the null hypothesis that the change in trend is not required. Perform also the similar Wald test and compare the results. Are the changes in trend terms required? (Use 5% level of significance.)

(f) Explain in what ways (if any) the model in part (e) could further be simplified. Consider the fixed effects specification as well the residuals covariance structure. Read about the choices of correlation structure from the help of nlme: corAR1, corARMA, corCAR1, corCompSymm, corExp, corSymm, corGaus e.t.c.  and the weights options to them, to make your choice. Justify your model choice by including the results of significance tests or other appropriate comparisons such as AIC, BIC. (Note that there is no unique“right answer”to this part and it is essential to justify your choice.)

(g) For the choice of covariance model you selected in part (f), can you write down a mixed effects model of the type considered in Week 5, Lecture 2 that would give this covariance structure? Explain why or why not.

2. Hint:  Recall (and use where it suits below) that if X ∼ Nn (µ,Σ), then the moment generating function of X is E(euT X ) = euT µ+uT Σu .

Consider a single subject k  (k  =  1, 2, . . . ,N) from a balanced design in a longitudinal study whose observed responses are conditionally Poisson in the following sense. Let εj for j = 1, . . . ,n be random errors from a multivariate normal distribution with mean vector zero and covariance matrix Σ, where the (i,j) element of Σ is σij .  Conditional on ε = (ε1 , . . . ,εn)T   (and for the fixed design vectors (x1 , . . . ,xn)), assume that the count responses Yj at times tj are independent Poisson random variables with mean and variance equal to µj αj , where log(αj ) = εj and

log(µj ) = β0 + xj(T)β,

with xj denoting a vector of regression variables observed at the jth time, β0  being an intercept term and β being a vector of regression parameters.

In your answers below please use the notation:  Let µj  = exp(β0  + xj(T)β) and E(Yj ) = µj(*) = µj exp(σjj /2) = exp(β0 + xj(T)β + σjj /2) = exp(β0(*)j + xj(T)β) where appropriate.

(a) Note that the distribution of αj is called a log-normal distribution. Find E(αj ) and var(αj ). (b) Show that for the conditional distribution of Yj given ε it holds E(log E(Yj |εj ;xj )) = log(µj ).

(c) Find the conditional expected value E(Yj | εj ) and the marginal E(Yj ).  For a unit change (i.e., increment by 1) in the kth regression variable at time j, find the proportional change in the conditional and marginal means:

E(Yj | εj ; xj(⋆))              E(Yj ; xj(⋆))

E(Yj | εj ;xj )             E(Yj ;xj ) ,

where xj(⋆) is equal to xj , except that xjk has been replaced by xjk + 1. Use these to conclude that, apart from the intercept term β0 , the remaining regression parameters βk have the same marginal and conditional interpretations. What are these interpretations?

(d) Find the variances var(Yj ) and covariances cov(Yj ,Yk) for all values of j and k .

Hint: You may use the properties var(Yj ) = E(var(Yj | εj ))+var(E(Yj | εj )) and cov(Yj ,Yk) = E(YjYk) − E(Yj )E(Yk).  Evaluate each of the expectations in two steps (conditional then unconditional) using the moment generating function formula on top of the page, noting that E(YjYk) = E{E(YjYk | ε)} = µj µkE(αj αk) = µj µkE{exp(εj + εk)}.

(e) Overdispersion refers to the situation in which the variance of an observed outcome is larger than that predicted by its distributional model.  Under a Poisson model, the variance and mean should be the same.

i. Show that both E(Yj ) and var(Yj ) are not smaller than µj but var(Yi) ≥ E(Yi) holds (this implies that having a random component in the log mean can lead to overdispersion in Yj relative to a plain Poisson model (i.e., var(Yi) ≥ E(Yi)).

ii. Note the only condition(s) under which the variance is equal to the mean, and hence the only situation(s) in which Yj has a marginal Poisson distribution.

iii. Assume that Σ (the covariance matrix of the ε) has a compound symmetry structure. Show that this does not result in a compound symmetry structure for the Yj except when the xj = x for all j. That is, the regressor vector does not change with the time of observation, leading to a trivial model in which the β components are not identifiable.