Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

EXAMINATION FOR THE DEGREES OF M.A. AND M.SCI. (SCIENCE)

2021

Biostatistics

1. In this question you will perform a meta-analysis of i = 1, 2, 3 studies examining the relationship between an unhealthy diet and developing coronary heart disease (CHD). You may view unhealthy diet as a risk factor and CHD as a health outcome.

Some results have already been computed for you and are shown in the table below. Note, some values are missing (denoted by ???), since you will be computing them as you work through the subparts of this question:

Study

OR

&[log(OR)]

95% CI

W

1

1.211

0.0384

(0.825,

1.778)

23.700

2

???

???

(0.887,

2.323)

???

3

1.276

0.0255

(0.933,

1.744)

33.193

i. Interpret the odds ratio (OR) and 95% CI for study 1.                    [2 MARKS]

ii. The following data are given for study 2:

 

CHD

Yes   No

unhealthy diet healthy diet

243

256

Total

80     419   499

Compute the OR and the variance of the log(OR) for study 2.

[2 MARKS]

iii. The definition of the Mantel-Haenzel OR in the context of this example is:

ORMH    =      i(3)=1 3(W)i  x ORi ,

i=1 Wi

where ORi  is the odds ratio of the ith  study.

State the definition of the weight Wi , making sure to explain precisely the mean- ing of every symbol you use.                                                              [2 MARKS]

 

iv.  Compute the weight W2 , that is the weight assigned to study 2 in the computation of the Mantel-Haenzel odds ratio.                                                        [1 MARK]

v. Use all the information that was given to you and that you have computed in the above parts of this question to calculate the Mantel-Haenzel OR.    [2 MARKS]

vi. Define the variance of the log(ORMH ) in the context of this example, making sure to explain precisely the meaning of every symbol you use.               [3 MARKS]

vii. Use all the information that was given to you and that you have computed in the above parts of this question to calculate the 95% CI of the Mantel-Haenzel OR.

[4 MARKS]

viii. Interpret the Mantel-Haenzel OR and its 95% CI.                            [2 MARKS]

ix.  Sketch a forest plot to visually summarise the meta-analysis. Your graph should be clearly labelled and include a vertical line at x = 1.                     [5 MARKS]

 

2.  (a) Data from the the Stanford Heart Transplantation Program are available.  The dataset consists of 5 observations and 5 variables that are

❼ id - unique patient ID

❼ time - number of days from heart transplant until death or censoring occurs ❼ status - censoring indicator: 1 - event time; 0 - censored time

❼ age - age of patient (in years) at time of heart transplant

❼ t5 - mismatch score (continuous). This score measures the mismatch between

the patient and donor heart.  The higher the mismatch, the more likely the chance of the new heart being rejected by the recipient’s body.

id

time

status

age

t5

139

86

1

12

1.26

159

10

1

13

1.49

181

60

0

13

NA

119

1116

0

14

0.54

74

2006

0

15

1.26

i.  Compute the Kaplan-Meier survival estimator for the data shown above, at each uncensored time point.  For both of these time points, clearly state the number of subjects at risk and the number of patients who experience the event of interest.                                                                           [3 MARKS]

ii.  Sketch the Kaplan-Meier survival estimator you computed above.  Give the axes informative labels.                                                                [2 MARKS]

iii. Use your previous answers to estimate the probability of a patient surviving at least 30 days after having received a heart transplant.             [1 MARK]

iv. Use the relationship between the survival function and the cumulative hazard function to compute the Kaplan-Meier cumulative hazard function at each uncensored time point.                                                                 [2 MARKS]

v.  Compute the Nelson and Aalen cumulative hazard function at each uncen- sored time point.                                                                          [2 MARKS]

vi. Briefly explain what is meant by parametric analysis in the context of survival analysis. Is the Nelson and Aalen cumulative hazard estimator a parametric or non-parametric estimator?                                                       [2 MARKS]

(b) Two datasets from the Stanford Heart Transplantation Program are shown below.

They show the survival times of patients aged 19 and 49 years, respectively.  We will call the 19 year old cohort Group 1 and the 49 year old cohort Group 2. The subparts of this question will guide you through a log-rank test to compare the survival distributions between the two age groups.

id

time

status

age

t5

131

834

0

49

NA

80

1866

0

49

0.51

20

1996

1

49

0.91

25

2878

1

49

0.75

i.  Clearly state the null and alternative hypotheses of the log-rank test in the

context of this example.

[2 MARK]

ii.  Compute the test statistic of the log-rank test.                          [5 MARKS]

iii. Interpret your result in detail. You might find following R output useful:

>  qchisq(0.95,1)

[1]  3.841459

>  qchisq(0.05,1)

[1]  0.00393214

>  qchisq(0.95,2)

[1]  5.991465

>  qchisq(0.05,2)

[1]  0.1025866

[2 MARKS]

(c) This question uses the full dataset from the Stanford Heart Transplantation Pro-

gram, which consists of 184 observations. The following model was fit in R: coxph(Surv(time,  status==1)  ~  age  +  t5,  data=data)

i. The R output contains the following coefficients for the covariates: ❼ coefficient for age: 0.02961;

❼ coefficient for t5: 0.17041.

Compute the hazard rate for the variable age and interpret it. How does the variable age affect survival?                                                         [3 MARKS]

ii. The definition of the proportional hazards model is:

h(t, zi ) = h0 (t) exp(zi(T)夕),    for i = 1, ..., n = 184.

Explain what h0 (t), zi  and 夕 mean in the context of the heart transplant data and state the values of 夕. Use the above definition as starting point to show that the hazard ratio is time invariant.                                       [3 MARKS]

 

3.  (a) Data on ethnicity and COVID-19 test results are available for 348,598 subjects aged 37-73. Among subjects with a negative COVID-19 test 331,464 report to be white, while 16,685 report to be of ethnic minority. Among subjects with a positive COVID-19 test 385 report to be white and 64 report to be of ethnic minority.

i. Produce a 2x2 table summarising the data.  Rows should correspond to the risk factor and columns to the test result.                                   [2 MARKS]

ii. What is the attributable risk of the risk factor ethnic minority on testing pos- itive for COVID-19 per 1000 people. Interpret this risk.            [2 MARKS]

iii. What is the relative risk  (RR) of testing positive among ethnic minorities compared to white subjects?  Provide a 95% CI for your RR estimate and interpret your results.                                                                  [5 MARKS]

iv. What is the odds ratio (OR) of testing positive among ethnic minorities com- pared to white subjects per 1000 people? Interpret that risk measure.

[2 MARKS]

v. Is the RR and the OR similar? Explain why that is, or is not, the case.

[2 MARKS]

vi. Two different studies report a risk of 1.5, but do not clearly specify how this risk is defined. Assume study A fits a Poisson regression model and study B fits a logistic regression model.  What risk measure do study A and study B likely refer to? Justify your reasoning.                                           [2 MARK]

(b)  Generally speaking, three main categories of tests are used for COVID-19: molec-

ular, antigen and antibody tests. PCR tests are molecular tests and are considered the gold standard of testing - they have high sensitivity and high specificity. Lat- eral flow tests are antigen tests that are quicker than PCR tests, however they are controversial as they suffer from low sensitivity. The following data are available for a lateral flow test produced by the company Innova.

 

COVID-19

present   absent

Positive lateral flow test Negative lateral flow test

235

5449

Total

372         5312     5684

i. Explain why a screening test with low sensitivity is problematic.  [1 MARK]

ii.  Compute the sensitivity, specificity, false negative rate and false positive rate of the Innova lateral flow test. Interpret all those statistics.      [5 MARKS]

iii.  Compute the sensitivity and specificity for a scenario where two lateral ow tests are carried out and

A. both tests have to be positive for us to believe the disease is present;

B. at least one of the tests has to be positive for us to believe the disease is present.

Assume that the second test will be carried out regardless of the result from the first&nbs