Flexible Regression - Level M Assignment 2022
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Flexible Regression - Level M Assignment 2022
Introduction
This assessment is worth 15% of your total course mark. It should be submitted electronically (as a pdf document)
using the file upload system on the Flexible Regression Moodle page by 12noon on Monday 12th December
2022. The final mark for the assessment will be converted to an alpha-numeric grade and combined with the alpha-numeric grade from the degree exam to form your overall mark for Flexible Regression. Each section relies on the stated reference material and datasets, which are freely available in the ‘Masters Assessment’ section on the course Moodle page. Please contact Ruth O’Donnell immediately if you cannot access this material. You should include the R code you use for fitting the models in questions 2 and 3 in line with your findings.
Question 1: Interactions
Reference: Semiparametric Regression, Ruppert, Wand and Carroll, Chapter 12 and Chapter 13.1-13.2
Read the above reference material and write short summary notes (~ 1/3 of a page each) in your own words on each of the following:
■ Binary-continuous interactions
■ Varying-coeffcient models
■ Bivariate smoothing.
In your summary notes: describe each type of interaction, state example model formula in each case (e.g. y=80 + 81x + i) and give a real-life example of a situation where each type of interaction may plausibly arise. In addition to this, comment on the differences between each type of interaction.
[8 Marks]
Question 2: Scottish Health Data
Reference: Extending the Linear Model with R, Faraway, Chapter 12.2
Dataset: health .dat
The dataset contains information on Body Mass Index (bmi), age, sex (coded as ‘Female’/‘Male’) and type of work (coded as ‘manual’/‘non-manual’) for a sample of the Scottish population. The data can be read into R through:
health <- read .table("health .dat" , header = TRUE)
Read the reference material (Ch 12.2) above and use it (along with your lecture notes) to fit an appropriate (generalised) additive model using library(mgcv) for a response of bmi, that has a linear term for sex and an interaction between age, as a smooth function, and the factor work. (Hint, terms such as s(x, by=z) can be used for nonparametric interactions between a continuous variable x and a factor z in such models). Use the automatic method of Maximum Likelihood (ML) to select the smoothing parameters. Check the assumptions for your model.
Write a short summary to describe the model that has been fitted and how well it fits the data, as well as the
results from the fitted model. Include appropriate output and plots in your summary.
If an alternative additive model had been proposed (e.g. with a different set of co-variates), briefly discuss how you might compare the alternative model to the one you have just fitted.
[10 Marks]
Question 3: Lake Water Quality
One of the features of interest for lakes is the water quality and hence the relationship between chlorophylla (as an indicator of water quality) and Soluble Reactive Phosphorus (SRP, a nutrient) is very important.
Reference: Extending the Linear Model with R, Faraway, Chapter 11. 7
Dataset: lldata .dat
The data are the natural logarithm of the monthly means for chlorophylla (lchla) and SRP (lsrp) from January
1990 to December 2007. Natural log transforms of the data are used to stabilize the variance and there are some missing values. Columns of data for year and month are also provided. Data for each variable are recorded at the same time point and hence either could act as a response here. The data can be read into R through:
lake <- read .table("lldata .dat" , header = TRUE)
Read the above reference material (Ch 11. 7) and use it to fit a bivariate smooth term using kernel smoothing in two dimensions for a response of SRP and a bivariate term between Chlorophylla and month. This will provide
a model that allows the relationship between the response of SRP and explanatory of Chlorophylla to change
smoothly by month. Use a very high smoothing parameter for Chlorophylla to force the relationship between SRP and Chlorophylla to be linear. Select the smoothing parameter for month subjectively by looking at plots. This type of model is referred to as a varying-coeffcient model.
Write a short summary to describe intiial impressions of the data, the model that has been fitted and the results. Include appropriate output and plots in your summary.
Although your smoothing parameter has been chosen subjectively, you should briefly discuss alternative approaches
and their appropriateness within your summary.
[11 Marks]
2023-01-02