Model matrices
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Model matrices
Modelling formulae in R
Background
GENSTAT was a statistical package developed at the famous Rothamsted Experimental Station in England. Amongst the many statistical innovations at Rothamsted was a powerful modelling language for GENSTAT which allowed users to compactly specifiy the model matrix from an experimental design. This became known a “Wilkinson-Rogers notation’ ’ after its authors (see reference at bottom) and was adopted (and adapted) in many statistical systems thereafter (including glim, and the § language on which R is based). In R, as in §, the modelling language is used to create a formula object that can then be passed to functions which create the corresponding model matrix as needed.
In this question, you will explore the modelling language, its connection to the vector-matrix representation of the mean response model, and the mathematical expression of the mean as a function of all values of the explanatory variates.
For the mathematical expression, it may be convenient, to use indicator functions defined as
, 1
IA (a) = .
( 〇
if a ← A
if a \← A
for any set A. For example, if separate straight line models are being fitted for each sex of some species, the single equation mathematical model can be expressed as
µ(x) = β〇I{f emale;(sex) +β1 I{f emale;(sex)|x +γ〇I{male;(sex) +γ1 I{male;(sex) .
When the sex for the ith observed response yi is female, the mean will be modelled as µ(xi ) = β〇 +β1 xi , and, when it is male as µ(xi ) = γ〇 +γ1 xi .
There are four unknown linear parameters here, which can be gathered together into a single parameter
vector 扌T = (β〇 , β 1 , γ〇 , γ1 ) to allow the mean response to be expressed in matrix and vector terms as 〉= x扌
with 〉T = (µ1 , . . . , µn ) . Note that the contents of the model matrix x must be such that when it is multiplied by the parameter vector 扌 , the correct mathematical model will appear for each µi in the mean vector 〉. This means x must have the same number of columns as there are elements in 〉and, importantly, the order of the columns is entirely determined by the order of the elements of the vector 扌 . For this example, the first column of x must be 1 whenever the sex of the ith observation is female, and 0 otherwise; the second column contains xi when the sex is female and 0 otherwise; the third column contains 1 when the sex is male and 0 otherwise; and the last column is xi whenever the sex is male and zero otherwise. Had the parameters been ordered differently in 扌 , the columns of x would have to have been rearranged to match.
In the questions which follow, R ’s the modelling language will be used to construct a model matrix and, after inspecting the resulting model matrix, you will write down the single mathematical equation which expresses µi as a function of the relevant explanatory variate values. Indicator functions may be needed for some expressions, possibly not for all.
The data set that will be used is an small artificial one called groupedData, which you should have downloaded as groupedData .Rda from the same website as this question. Once downloaded, and saved somewhere on your machine, it can be loaded into an R session as
# a4rec芒org uhere 芒he aa芒a ha①e been sa①ea∶
dataDir <- " . ./data"
load(file .path (dataDir, "groupedData .Rda"))
# f4rs芒 feu rous
head(groupedData)
## group year x z
## 1 1 1 -0 .8 1 .6
## 2 2 1 -0 .7 9 .7
## 3 3 1 1 .4 0 .5
## 4 1 2 0 .3 5 .1
## 5 2 2 -0 .4 3 .8
## 6 3 2 0 .1 3 .1
# wo芒e 芒ha芒 芒h4s aa芒ase芒 neea owLy con芒a4n e匹plana芒org ①ar4a芒es,
# no response ①ar4a芒e 4s regu4rea for 芒h4s gues芒4on:
#
Model matrices can be constructed using the model .matrix() function in R. For example, the matrix x for
a straight line model can be constructed as
model .matrix(~ x, data = groupedData)
## (Intercept) x
## 1 1 -0 .8
## 2 1 -0 .7
## 3 1 1 .4
## 4 1 0 .3
## 5 1 -0 .4
## 6 1 0 .1
## 7 1 1 .2
## 8 1 -0 .8
## 9 1 -1 .8
## 10 1 -2 .3
## 11 1 -1 .6
## 12 1 0 .8
## attr(,"assign")
## [1] 0 1
Examining the whole matrix, one can infer that the single mathematical equation describing the mean of the ith response is
µi = β〇 +β1 xi .
The corresponding parameter vector would be 扌T = (β〇 , β 1 ) where for convenience, 扌 is always used for the entire vector, and the parameters within it selected to distinguish separate parts of the model as desired. Simply using θ 1 , . . . , θp as the elements of 扌 will always work, as long as they also appear as the only parameters in the single mathematical equation. The choice of Greek letters to use for the linear parameters (on the right hand side of the equation) is yours to make – excluding using either σ or µ of course – as long as they match the vector.
Reference
G. N. Wilkinson and C. E. Rogers (1973) “Symbolic Description of Factorial Models for Analysis of Variance’’, Journal of the Royal Statistical Society. Series C (Applied Statistics), Vol. 22, No. 3 (1973), pp. 392-399
Model matrices
13 marks
a. For each of the following model matrices, write down
● the single mathematical equation for the mean of the ith response
● the vector of parameters in the correct order so as to match your equation for every µi (i.e., i = 1, . . . , 12) when multiplied by the model matrix mX .
Express each model in terms of the relevant variates of groupedData.
i. (1 mark) model .matrix(~ 0 + x, data = groupedData)
ii. (1 mark) model .matrix(~ -x, data = groupedData)
iii. (2 marks) model .matrix(~ group + x, data = groupedData)
iv. (2 marks) model .matrix(~ factor(group) + x, data = groupedData)
v. (2 marks) model .matrix(~ 0 + factor(group) + x, data = groupedData)
vi. (2 marks) model .matrix(~ group/year, data = groupedData)
b. (3 marks) Suppose the model has two straight lines, one on the first m observations, and another on the remaining n 一 m observations. However, the lines are restricted in that the model supposes that must intersect when x = a.
Write down the parameter vector and its corresponding model matrix x for this mean response model. Show your reasoning.
2023-02-18
Modelling formulae in R