Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

MATH363: Group Project part 2

Group project contributes 60% to your mark for Math363.

Part 1 = 15 marks; part 2 = 85 marks

You will work on all parts of this project as a group and submit your answers as a group. In addition to the project you need to submit minutes from your meetings and provide the peer as-sessment of everyone’s contribution through Buddycheck. The final mark for each member of the group will be adjusted according to these peer scores.

The project needs to be submitted on Canvas and will be checked for potential plagiarism using turnitin. A high similarity score might result in the project being investigated for suspected plagia-rism; see Code of Practice on Assessment Appendix L.

DATA SETS FOR EACH GROUP ARE DIFFERENT and THEREFORE YOUR RESULTS AND ANSWERS SHOULD BE DIFFERENT

Please only discuss your project with your group members. Any similarity between di↵erent project submission might be investigated for suspected plagiarism.

You must include your minutes [10 marks] and include all relevant code in the appendix [10 marks].

Part II

1. [25 marks] A six minute walking test (6MWT) is used to assess functional exercise perfor-mance. In this test the subject is asked to walk for 6 minutes on a level course and the distance covered is recorded in meters (m). The pace is set by the subject and breaks are allowed if needed. The data were collected on healthy subjects and includes information about

sex (Female = 1; Male = 0 );

age (in years);

height (cm);

weight (kg);

BMI (= body mass index);

resting heart rate (beats per minute, bmp);

heart rate at the end of the 6 minutes;

current smoking status (1 if smoker);

if the person ever smoked;

usual activity level: 0 =sedentary (less then 30 min physical activity a day), 1 = moder-ately active ( 30-60 min physical activity a day), 2 = active ( more than 60 min physical activity a day);

You are asked to find a model for dependance of the distance travelled in 6MWT on explanatory variables provided. The model will be used to predict the average distance for a person and should only include variables available before the test is taken (that is, not their heart rate at the end of the test).

(a) Use appropriate plots to help suggest possible models. You should consider here which of the explanatory variables are covariates and which are factors.

(b) Fit di↵erent linear models as suggested by the plots in (a) and decide which model is most appropriate. Examples of things that you could consider here are models with quadratic or cubic terms, interactions, other transformations of variables are also possible.

(c) For the model chosen in (b), perform residual analysis and decide if the model fits well. If it does not, suggest changes that can be made to the model to address these issues.

2. [10 marks] Based on the analysis in Q1, recommend a model which should be used to predict distance travelled in 6MWT and interpret the parameters of your model. Explain any limita-tions of your model. Discuss what information about the patient is needed to use your model and provide examples of predicted values including confidence and prediction intervals. You can use some of the subjects included in your data set for these examples. Indicative word count: 500

3. [15 marks] Choosing a model for large data sets can be a very complex task but there are a few different methods which help researchers to decide on the best set of variables to be included in a model in a systematic way. Three commonly used methods are forward selec-tion, backward elimination and stepwise regression. Research one of these methods and write a short explanation in your own words of how this method works. Apply this method to your data on 6WMT. Discuss similarities and di↵erences between the model obtained here and the model you choose in 1(b). You should use published resources to in this part and cite any books or articles used. A good source here is Applied regression analysis by H. Smith, and N.R. Draper (on reading lists).Indicative word count: 750

4. [15 marks] It is also of interest to model the probability that a subject reached the threshold heart rate (HR) for vigorous-intensity physical activity (VIPA), that is a heart rate of at least 77% of maximum HR. Maximum HR is calculated as 220 minus age. For example, for a person who is 35, maximum HR is 185 bpm and VIPA is when their HR is at least 0.77(220 − 35) = 143bpm. The researchers want to use the data set used in Q1 to estimate a probability of VIPA during 6MWT. The distance travelled should be one of the explanatory variables in this case.

(a) Create a new variable which is 1 if a person’s HR reached VIPA, and 0 otherwise (you might want to first create a variable with maximum HR for each person).

(b) Propose a model for the probability of VIPA. You do not need to consider interactions or higher order terms in this case but you need to consider which link function is most appropriate.

(c) Interpret your model parameters, discuss its fit and any limitations.