关键词 > MATH4/68091

MATH4/68091 Statistical Computing

发布时间:2021-01-15

MATH4/68091 Statistical Computing

Coursework 3: Bootstrap Extensions


      Deadline: 14:00 Thursday 21 January 2021

      Please submit via Blackboard. Late submissions may be penalised according to the Department’s procedures. Note also that I do not have the authority to grant extensions to the deadline.

      By submitting the coursework you declare that you are its sole author. In particular, you should not collaborate with your peers.


      Your submitted solutions should all be in one “pdf” document. You are strongly advised to produce this document using LATEX, but you will not be penalised if you use other software. Do not include screenshots of the R console or graphics. R commands should be shown using a typewriter type font.

In LATEX this can be done with:

\begin{verbatim}

R code

\end{verbatim}

Plots should be saved to files (one per graphics) and included in the document with suitable LATEX commands. Examples of how to save a plot to a file are given in my guide “Rhints.pdf”, available on Blackboard and on my web page. Users of other text editing software should use the facilities provided by it.

      For each part of the questions you should provide explanations as to how you completed what is required, show your working and also comment on computational and/or graphical results, where applicable. Aim to be concise. For computational questions, show your code.

      The total marks for this paper are 40.


1. A clinician would like to study whether the length of a new-born baby can be predicted from the values of both the mother’s and father’s height. To this end, we have data available in the fifile birth_length.txt. The file contains n = 42 records in four columns of data which correspond to:

length (to be denoted by y): the recorded length of the baby at birth (inches).

mheight (m): the height of the mother (inches).

fweight (f): the height of the father (inches).

smoker (s): a binary variable indicating whether the mother was a smoker or not

(0 = non-smoker, 1 = smoker).

   The clinician proposes the following linear model for a new-born baby’s length:

yi = α + βmi + γfi + εi, (i = 1, . . . , 42),

where the errors εi are assumed to be observations of independent and identically distributed zero mean random variables.

   (a) Find the least squares estimates of (α, β, γ) for the birth length data.

[2 marks]

   (b) Produce a histogram of the residuals from the least squares fit for the fitted regression model of part (a) and comment on its form.

[2 marks]

   (c) Write a single function to estimate, using the bootstrap residuals method, the sampling distribu-tion of ˆγ, where ˆγ denotes the estimated coefficient of the father’s height in the above regression model. Run your function and use the results to produce a histogram estimate of the sampling distribution of ˆγ; estimates of the bias and standard error of ˆγ; the estimated Pr(ˆγ > 0).

[10 marks]

   (d) Produce a 95% confidence interval for the parameter β by using the bootstrap-t confifidence interval methodology.

   (Hint: This will involve finding, for each bootstrap sample, the estimated standard error, ˆσi∗ , of ˆβi∗ from the ith bootstrap sample (i = 1, . . . , B). This value can be obtained from the output objects created by both the lsfit and lm functions.)

Based on your resulting confidence interval, do you think that it is plausible that β = 0?

[10 marks]

   (e) For this part, consider now just the mother’s height, M, and father’s height, F, as two random variables so that the second and third columns of the birth length dataset can be regarded as a single bivariate set of paired data recorded on M and F . By using the bootstrap-t confidence interval methodology, estimate a 95% confidence interval for E M M EF. Is it plausible that E M = EF?

[8 marks]

   (f) Now consider the baby’s length, Y , and the smoking status of the mother, S, to be two random variables so that the first and fourth columns of the birth length data matrix can be regarded as a single bivariate random sample of paired data on these variables. Define θ = E(Y |S = 0) ) E(Y |S = 1). Use bootstrap methodology to estimate Pr(ˆθ > 0), where ˆθ is an appropriately chosen estimate of θ based on the data. (You should include a histogram of your bootstrap values of θ in your answer.) What are your conclusions?

   (Note that this part involves the original response variable for the regression model that you fitted in part (a). However, the solution here does not require you to consider and fit a regression

model.)

[8 marks]