Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

MATH5916:  Survival Analysis

Term 1, 2022

Assignment 2

1. Consider the log-linear model with fixed covariates

log Ti  = µ + α 1 x1i + ... + αp xpi + σϵi

= µ + xi(T)α + σϵi

(a) Show that the survival function of Ti  is

Si (t) = S0 (texi(T)α )

where S0 (t) = P(eµ+σϵi   > t) is the baseline survival function (the survival function for an individual with zero covariates).

(b) The log-linear model above is an accelerated failure time model, because the effect of the explanatory variables x is to speed up or slow down the time scale for the failure process. The acceleration factor is e xi(T)α .

Consider an accelerated failure time model with a single binary variable for treat- ment group:  x = 0 for the standard treatment group and x = 1 for the new treatment group.

i. Describe the effect of treatment on survival if α is (1) positive or (2) negative.

ii. Use the definition of expectation to derive a relationship between the expected lifetimes for the two treatment groups.

(c) Show that the hazard function corresponding to the survival function in (a) is hi (t) = e xi(T)α h0 (texi(T)α )

where h0 (t) is the baseline hazard function.

(d) Suppose that the survival time for an individual with zero covariates has a Weibull (λ,γ) distribution. Show that the hazard function for individual i with covariate

vector xi  is

hi (t) = e γxi(T)α λγtγ 1 .

Deduce that the survival time for individual i also has a Weibull distribution and state the parameter values.

(e) The Cox proportional hazards model hi (t) = exi(T)β h0 (t) leaves the baseline haz- ard function h0 (t) non-parametric.  The Weibull proportional hazards model as- sumes a Weibull distribution for the baseline hazard function, so that hi (t) = exi(T)β λγtγ 1 .

Comparing this with (d), show that the accelerated failure time model for the Weibull distribution also has a proportional hazards interpretation. (In fact, the Weibull is the only distribution with both the proportional hazards and acceler- ated failure time properties).


2. This question uses the PBC dataset used in the lectures and tutorials.  This data is available on Moodle and also included as part of the survival package.

(a) Using this data, follow the model selection strategy proposed by Collett (discussed in Lecture 7 notes).  If included in the model, use log-transformations for the variables bili, albumin and protime, and be sure to include age in years.        Note that the status variable take on the values in {0,1,2} and you should choose status  ==  2 as it is the event of interest here.  Some data are also missing and na .action  =  "na .omit" can be used to address this issue. Finally, be judicious with your output as I need not see every bit of code and model output information. Summary table(s) will suffice.

(b) Fit all possible main effects models to the PBC data and compute AIC and BIC for each model. Create an index plot similar to Lecture 7.

(c) Do the final models chosen by Collett’s strategy, AIC and BIC agree with each other?

(d) In consideration of your response to part (c), which covariates appear to important in explaining survival for these patients?

3. A follow-up study on the 312 PBC trial participants was also undertaken (pbcseq .csv), and a brief description is contained in the tutorial notes.

For this study, multiple measurements over time were obtained for some of the prog- nostic variables, so this data can be analysed using Cox regression models with time- dependent covariates.

(a) Fit a Cox regression model including the variables you deemed important from question 2, treating the longitudinally measured variables as time-dependent co- variates.  Do any of the variables become non-significant in this model?  Re-fit if required, excluding non-significant variables to arrive at a final model.  Write down the fitted model for the hazard function.

(b) Give an interpretation of the estimated regression coefficients for the model in (a). For each of the prognostic variables included in the model, indicate whether an increase in these variables has a beneficial or detrimental effect on survival.

(c) What was the maximum number of observations, m, taken on a patient?  Iden- tify the three patients with m measurements.  Plot values of the longitudinally measured variables over time for these three patients.  Based on these plots and the values of any fixed covariates, can you suggest why these patients survived a relatively long time?