Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Regression Modelling for Biostatistics 1 (RM1)

Assignment 2 (30%)

Due:  Monday 8 May 11:59pm (Sydney local time)

Submission Instructions

Before submitting your assignment, please make sure you have read the BCA Assessment Guide which includes links to Academic Honesty and Plagiarism policies at all BCA consortium universities:

• The University of Adelaide

Monash University

The University of Queensland

The University of Sydney

Macquarie University

The University of Melbourne

By submitting this work, you will acknowledge that you comply with the student plagiarism policy and procedures of the University of your enrolment and that this may lead to the University commencing proceedings against you for student misconduct.

Do not leave it to the very last minute to submit your assignment. Any submission after the due date will be counted as late (even a few minutes).  If you are having trouble submitting your assignment, email me a copy of your submission before the due date and time as proof of completion by the due date. Any issues with submitting online will not be grounds to waive late penalties.

Please keep a copy of your own assignment.

Late assignments will receive the penalty of 5% (0.5 marks) per late day or part thereof. Submissions greater than 10 days late will not be accepted without prior approval and will receive a mark of 0.

Background:  The dataset in the file bmd_drug.csv was collected at a radiology clinic and includes bone mineral density scans  (densitometry) of 1077 patients.   The densitometry is used as a diagnostic exam for several conditions including osteoporosis.  The bone density is measured in several parts of the body but we will focus on the total hip bone mineral density measurement (total hip BMD).

These variables are included in the dataset:

id - ID number (study); pat_inde - ID number (clinic); age - age of the patient (years); height – height in cm, sex - Male or Female; weight - weight in Kg; bmdtot_hip – bone mineral density at the hip

There are other variable but we will not used them.

Import the data into R or Stata and answer the following questions:

Question 1

a)  Compare the hip BMD between males and females and summarise your findings.   [2 marks]

b)  Repeat the analysis comparing the hip BMD between males and females but now taking into account  (adjusting for) the age of the patients.   Check the assumption(s) of the method used and address any potential violation of the assumption(s). Summarise your findings.  [3 marks]

c) We are interested in the change of hip BMD with age and how this may be different for men and women.  Use appropriate methodology to address this research question and present your conclusions. Make sure you check any assumptions of the method(s) used. [3 marks]

d)  Summarise the results of c) in a publication quality” plot (i.e., make sure the elements of the plot are properly labeled, well formatted, and include the title).  [2 marks]

Question 2

a)  Compute the body mass index given by  .  Notice that the variable height in the dataset is in cm, not in meters.  [1 mark]

b)  Construct the x matrix (the design matrix) for the model below and compute manually” the OLS for the p coefficients.  Compare these estimates with the ones obtained by the linear regression commands of Stata or R. [3 marks]

ℎipBMD = p0 + p1Age + p2Se + p3Age × Se + p4BMI + p5BMI2 + e

c) Using the model in b), what is the predicted hip BMD for a female, 55 years old, with

36 of BMI (add the 95% prediction interval)?  [3 marks]

d) Interpret the association of age and hip BMD obtained in the model b) [3 marks]

Question 3

a) Compute and interpret the (non-adjusted) R2 for the model ℎipBMD = p0 + p1Age + p2Se + p3Age × Se + p4BMI + p5BMI2 + e and identify one limitation of this R2 as a goodness-of-fitness statistics. [2 marks]

b)  Split the dataset into two datasets: fitting and testing.  The fitting dataset will include

750 observations and the testing the remaining ones (set the seed of your preference but provide it in your code so I can replicate your results). [1 mark]

c) Fit the model in a) to your fitting dataset. Using the fitted model, apply it to the testing data and compute the corresponding R2.  How fo you compare it with the one in a)? If you were not able to compute it, explain how would you expect it to differ.Hint: R2 = 1 −  . [4 marks]

d) Fit the model similar to a) but use a spline (with 4 knots at default locations) to model the non-linear effect of BMI instead of the quadratic term.  What model is preferable, the one with the quadratic term or the one with the spline? Justify. [3 marks]