Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Flexible Regression - Level M Assignment 2022

Introduction

This assessment is worth 15% of your total course mark.  It should be submitted electronically (as a pdf document)

using the le  upload system on the  Flexible  Regression  Moodle  page  by  12noon on  Monday  12th  December

2022.  The nal mark for the assessment will be converted to an alpha-numeric grade and combined with the alpha-numeric grade from the degree exam to form your overall mark for Flexible Regression.  Each section relies on the stated reference material and datasets, which are freely available in the Masters Assessment’ section on the course Moodle page.  Please contact Ruth O’Donnell immediately if you cannot access this material.  You should include the R code you use for tting the models in questions 2 and 3 in line with your ndings.

Question 1: Interactions

Reference: Semiparametric Regression, Ruppert, Wand and Carroll, Chapter 12 and Chapter 13.1-13.2

Read the above reference material and write short summary notes (~ 1/3 of a page each) in your own words on each of the following:

Binary-continuous interactions

Varying-coeffcient models

Bivariate smoothing.

In your summary notes: describe each type of interaction, state example model formula in each case (e.g. y=80  + 81x + i) and give a real-life example of a situation where each type of interaction may plausibly arise.  In addition to this, comment on the differences between each type of interaction.

[8 Marks]

Question 2: Scottish Health Data

Reference: Extending the Linear Model with R, Faraway, Chapter 12.2

Dataset: health .dat

The dataset contains information on Body Mass Index (bmi), age, sex (coded as Female’/‘Male’) and type of work (coded as manual’/‘non-manual’) for a sample of the Scottish population. The data can be read into R through:

health  <- read .table("health .dat" ,  header  =  TRUE)

Read the reference material (Ch 12.2) above and use it (along with your lecture notes) to t an appropriate (generalised) additive model using library(mgcv) for a response of bmi, that has a linear term for sex and an interaction between age, as a smooth function, and the factor work. (Hint, terms such as s(x,  by=z) can be used for nonparametric interactions between a continuous variable x and a factor z in such models).  Use the automatic method of Maximum Likelihood (ML) to select the smoothing parameters. Check the assumptions for your model.

Write a short summary to describe the model that has been tted and how well it ts the data, as well as the

results from the tted model.  Include appropriate output and plots in your summary.

If an alternative additive model had been proposed (e.g. with a different set of co-variates), briefly discuss how you might compare the alternative model to the one you have just tted.

[10 Marks]

Question 3: Lake Water Quality

One of the features of interest for lakes is the water quality and hence the relationship between chlorophylla (as an indicator of water quality) and Soluble Reactive Phosphorus (SRP, a nutrient) is very important.

Reference: Extending the Linear Model with R, Faraway, Chapter 11. 7

Dataset: lldata .dat

The data are the natural logarithm of the monthly means for chlorophylla (lchla) and SRP (lsrp) from January

1990 to December 2007.  Natural log transforms of the data are used to stabilize the variance and there are some missing values. Columns of data for year and month are also provided.  Data for each variable are recorded at the same time point and hence either could act as a response here. The data can be read into R through:

lake  <- read .table("lldata .dat" ,  header  =  TRUE)

Read the above reference material (Ch 11. 7) and use it to t a bivariate smooth term using kernel smoothing in two dimensions for a response of SRP and a bivariate term between Chlorophylla and month. This will provide

a  model that allows the  relationship  between the  response of SRP and explanatory of Chlorophylla to change

smoothly by month.  Use a very high smoothing parameter for Chlorophylla to force the relationship between SRP and Chlorophylla to be linear. Select the smoothing parameter for month subjectively by looking at plots. This type of model is referred to as a varying-coeffcient model.

Write a short summary to describe intiial impressions of the data, the model that has been tted and the results. Include appropriate output and plots in your summary.

Although your smoothing parameter has been chosen subjectively, you should briey discuss alternative approaches

and their appropriateness within your summary.

[11 Marks]