Statistics – Functional Data Analysis 2021
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Statistics – Functional Data Analysis
2021
“An electronic calculator may be used provided that it is allowed under the School of Mathematics and Statistics Calculator Policy. A copy of this policy has been distributed to the class prior to the exam and is also available via the invigilator. ”
NOTE: Candidates should attempt all questions.
1. Answer the following questions:
(a) State whether the following three statements are TRUE or FALSE. You do not
need to provide any explanation for your answer.
i. A cubic spline basis with no internal knots is the same as an order three polynomial basis fit. [2 MARKS]
Answer
TRUE;
they can be used as they can accommodate variations from periodicity similar to third order polynomial fit
ii. The degrees-of-freedom of a penalised smoother increases with increase in the magnitude of penalty paramater . [2 MARKS]
Answer
FALSE;
degrees-of-freedom of a penalised smoother decreases
iii. When performing function on function regression if one uses a historical model the slope parameter is a surface. [2 MARKS]
iv. The elements of a Fourier basis of any order are orthogonal to each other
[2 MARKS]
For part (b)-(c) choose the correct answers from the four options. Note that in some cases more than one answer may be correct. You do not need to provide any explanation for the answer(s) that you choose.
(b) Suppose we are temporally smoothing the temperature data observed every 5 days
in 2010 for 50 cities in the UK using a B-spline of order 3 and knots placed every time point. Let Φ be the evaluation of the basis functions Φ(t) at the observed
time-points. The dimension of Φ is: [2 MARKS]
i. 75 ⇥ 3
ii. 72 ⇥ 365
iii. 365 ⇥ 75
iv. 73 ⇥ 75
v. 365 ⇥ 367
(c) Let waterfd be a functional data object obtained by using the smooth .basis function in the fda package. The code
plot(deriv .fd(waterfd$fd,2))
will plot the
i. smooth curve
ii. smooth curve with linetype 2
iii. first derivative of the curve
iv. second derivative of the curve
Answer
(iv)
2. Suppose you have observed data at 11 equally spaced time points on a single curve. The dataset is given by y0 ,y1 ,y2 , . . . ,y11 corresponding to time-points t = 0, . . . , 10
(a) If you are using a cubic b-spline with internal knots at t = 3, 6, 8 how many basis
functions do you have.
[2 MARKS]
(b) Using this example or in general prove that you cannot fit a unpenalized cubic
spline if you put a knot at every time-point.
[3 MARKS]
Answer
resulting number of basis 9+4=13, but only 11 data points
(c) The R code from fda library to create b-spline takes the following argument (rangeval=___, nbasis=___, norder=___, breaks=___)
Write the code for defining a cubic spline basis with knots at every time point using
(i) Only the arguments (rangeval=___, nbasis=___, norder=___)
[2 MARKS]
(ii) Only the arguments (rangeval=___, nbasis=___, breaks=___)
[2 MARKS]
(iii) Only the arguments (rangeval=___, norder=___, breaks=___)
[2 MARKS]
Answer
> a1=create .bspline .basis(rangeval = c(0,10),nbasis=13,norder = 4) > a2=create .bspline .basis(rangeval = c(0,10),nbasis=13, breaks = 0:10) > a3=create .bspline .basis(rangeval = c(0,10),norder=4, breaks = 0:10)
(d) What is the maximum number of adjacent intervals each of the basis functions of a cubic spline can have positive support on. Justify your answer.
[3 MARKS]
Answer
4 same as order
(e) What is the minimum order of spline should you use if you wish to calculate the third order derivative of the curve [2 MARKS]
Answer
order 5
(f) The penalised fit to the model yi = x(ti)+ ✏i is given by
PENSSEλ (x) = [y − x(t)]T [y − x(t)] + λR[x],
where y is the data x(t) is the smooth function, R[x] is the roughness of x(t) and λ is a smoothing parameter measuring compromise between fit and roughness. A solution of the above equation is given by
−1
i. Write the expression of R for a fourth derivative penalty [4 MARKS]
Answer
For the 4th derivative
J(x) = Z ⇥D4 x(t)⇤2 dt = Z cTD4 Φ(t)D4 Φ(t)Tc = c RcT
So
R = D4 Φ(t)D4 Φ(t)T
ii. What are the dimensions of c, Φ, R and λ? [2 MARKS]
iii. Name the functions in the fda library that you need to use to fit the penalised
smoother to the data.
[2 MARKS]
iv. State at least two approaches of determining an optimal value of λ
[2 MARKS]
3. Suppose we have data on the daily number of individuals testing positive for Covid (over 6 months) with similar testing capacity from 30 countries, 10 from each of the 3 continents Asia, Europe and South America. We wish to find out if there is di↵erence in how the disease has progressed in the three continents. Accounting for the di↵erence of population and the first case of the diseases we should ideally look at the rate of change.
Describe, the steps specifying the relevant functions from the fda library that you would need to test the null hypothesis, that the rate of change covid cases in the three continents are similar. [5 MARKS]
4. Continuing from question 3) in addition to the information on the daily number of indi- viduals testing positive for Covid, xij for country i at time point j there is information on
• the daily number of Deaths due to Covid (yij )
• proportion of population above the age of 80 (zi)
Our goal is to model the number of deaths as a functional object (response yi (t) obtained by smoothing the daily number of deaths for each country). We will consider the possible predictors for this model
• Total monthly number of cases for each of the six months (xtotik ,k = 1, . . . , 6) for each country.
• xi (t) obtained by smoothing the daily number of cases for each country.
• proportion of population above the age of 80 (zi) for each country.
(a) Propose a functional linear model (with complete notations) that one should use
in each of the following situations. For each linear model also state the structure of the response variable, covariate(s) and the regression coefficients (one of scalar, vector, curve, surface)
i. Number of deaths as functional object on total monthly number of cases for each of the six months and proportion of population above the age of 80.
[3 MARKS]
ii. A concurrent model of number of deaths as functional object on the smoothing the daily number of cases and proportion of population above the age of 80.
[4 MARKS]
iii. A full functional linear model of number of death as functional object on the smoothing the daily number of cases and proportion of population above the age of 80. [4 MARKS]
(b) Given the fact that deaths follow cases, and there is a lag between the number of
cases and the number of deaths, do you think the concurrent model in part (ii) or the full functional model in (iii) is appropriate? Justify your answer.
Propose an alternative model, with the same response and covariates as in part (ii) and write it out as a functional linear model. [6 MARKS]
Answer
(a. i ) Death as functional object on total monthly number of cases for each of the six months and proportion of population above the age of 80.
Functional response (yj ) and scalar covariates (xtotik , and scalar covariate (zi)
yi (t) = β0 (t)+X βk (t)xtotik + γ(t)zi + ✏i (t)
k
All β’s are functions.
(a . ii) A concurrent model of Death as functional object on the smoothing the daily number of cases and proportion of population above the age of 80
Functional response (yj ) and functional covariate xi (t), and scalar covariate (zi)
yi (t) = β0 (t)+ βk (t)xi (t)+ γ(t)zi + ✏i (t)
All β’s are functions.
(a.iii) A full functional linear model Death as functional object on the smoothing the daily number of cases and proportion of population above the age of 80.
Z
β0 (t) and γ(t) are curves and β1 (s,t) is a surface
(b)
Concurrent — dependence on the same day case.
Full Model — Dependence on past and present.
Should be historical model
yi (t) = β0 (t)+ Zot−6 β1 (s,t)xi (t)+ γ(t)zi + ✏i (t)
2022-07-20