STAT 3690 Lecture 07
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
STAT 3690 Lecture 07
2022
Useful properties of MVN
● X ~ MVNp (u, Σ) 兮 Z = Σ_ 1/2(X - u) ~ MVNp (0, I). So, we have a stochastic representation of arbitrary X ~ MVNp (u, Σ): X = Σ 1/2Z + u, where Z ~ MVNp (0, I).
● X ~ MVN iff, for all a e 皿p , aT X has a (univariate) normal distribution.
● If X ~ MVNp (u, Σ), then AX + b ~ MVNq (Au + b, AΣAT ) for A e 皿q×p and rk(A) = q .
● Exercise: Generate six iid samples following bivariate normal MVN2 (u, Σ) with
u = [3, 6]T , Σ = ┌ 2(10) 5(2) ┐ .
options(digits = 4)
set.seed(1)
Mu = matrix(c (3 , 6), ncol = 1 , nrow = 2)
Sigma = matrix(c ( 10 , 2 ,2 , 5), ncol = 2 , nrow = 2)
n = 1000
# Method 1: via rnorm()
spectral = eigen(Sigma)
SigmaRoot = spectral$vectors %*% diag(spectral$values ˆ .5) %*% t(spectral$vectors) A1 = matrix(0 , nrow = n, ncol = length (Mu))
for (i in 1:n) {
A1[i, ] = t(SigmaRoot %*% matrix(rnorm (2), nrow = 2 , ncol = 1) + Mu)
}
# Method 2: via MASS::mvrnorm()
A2 = MASS::mvrnorm(n, Mu, Sigma)
● Exercise:
1. Prove that (X - u)T Σ_ 1 (X - u) ~ χ2 (p) if X ~ MVNp (u, Σ).
2. Suppose X1 ~ N(0, 1). In the following two cases, verify that X2 ~ N(0, 1) as well. Does X = [X1 , X2]T follow an MVN in both cases?
a. X2 = -X1 ;
b. X2 = (2Y - 1)X1 , where Y ~ Ber (p) and Y 11 X.
- P.S.: Y 11 X 兮 fZ (:) = fX (z)fY (y), where Z = [XT , YT ]T
● Solution:
1. Y = [Y1 , . . . , Yp]T = Σ_ 1/2(X - u). Then (X - u)T Σ_ 1 (X - u) = YTY = i Yi2 ~ χ2 (p) since Yi N (0, 1).
2. a. Pr (X2 < x) = Pr (X1 > -x) = Pr (X1 < x) since pdf of N (0, 1) is symmetric with respect to x-axis; Pr(X2 = -X1 ) = 1 ÷ supp(X) = {(x, -x) I x e 皿};
b. Pr(X2 < x) = Pr(X2 < x I Y = 0) Pr(Y = 0) + Pr(X2 < x I Y = 1) Pr(Y = 1) = (1 - p)Pr(x1 > -x) + p Pr(X1 < x) = Pr(X1 < x). Pr(X2 = X1 ) = Pr(Y = 1) = p and Pr(X2 = -X1 ) = Pr(Y = 0) = 1 - p ÷ supp(X) = {(x, -x) I x e 皿} {(x, x) I x e 皿}.
options(digits = 4)
set.seed(1)
xsize = 1e4L
x1 = rnorm (xsize)
# case 1
x2 = -x1
plot3D ::hist3D(z=table(cut(x1, 100), cut (x2, 100)), border = "black") # 3d histogram of (x1, x2)
plot3D ::image2D(z=table(cut(x1, 100), cut (x2, 100)), border = "black") # plot the support of joint pdf # case 2
Y = rbinom(n = xsize, 1 , .3)
x2 = (2 * Y - 1) * x1
plot3D ::hist3D(z=table(cut(x1, 100), cut (x2, 100)), border = "black") # 3d histogram of (x1, x2) plot3D ::image2D(z=table(cut(x1, 100), cut (x2, 100)), border = "black") # plot the support of joint pdf
Joint, marginal and conditional MVN
● If X ~ MVNp (u, Σ) and
X = ┌ X(X)2(1) ┐ , u = ┌ u(u)2(1) ┐ and Σ = ┐
with Σ 11 > 0 and Σ22 > 0, then
- Xi ~ M VNpi (ui , Σii ), i.e., marginals of MVN are MVN.
- Xi I Xj = zj ~ M VNpi (ui|j , Σi|j), i.e., conditionals of MVN are MVN.
- ui|j = ui + Σij Σjj(_)1 (zj - uj )
- Σi|j = Σii - Σij Σjj(_)1 Σji
- X1 11 X2 兮 Σ 12 = 0
● Exercise: The argument X1 11 X2 兮 Σ 12 = 0 is based on the assumption that X = [X 1(T) , X2(T)]T is of MVN. That is, if X1 and X2 are both MVN BUT they are not jointly normal, a zero Σ 12 doesn’t suffice for the independence between X1 and X2 . A counter-example will be part of Assignment 1.
Checking normality (J&W Sec 4.6)
● Checking the univariate marginal distributions
- Normal Q-Q plot
* qqnorm(); car::qqPlot()
- Normality test
* shapiro.test()
● Checking the quadratic form
- χ2 Q-Q plot
* Di(2) = (Xi - )T S_ 1 (Xi - ) s χ2 (p) if Xi M VNp (u, Σ)
* qqplot(); car::qqPlot()
options(digits = 4)
install.packages(c ("car"))
library(datasets)
data(iris)
head(iris)
dataframe = iris[,1:3]
p = ncol(dataframe)
n = nrow (dataframe)
# Marginal normal Q-Q plot
car::qqPlot (rnorm (n), id = F)
car::qqPlot (dataframe[,1], id = F)
car::qqPlot (dataframe[,2], id = F)
# Univariate normality test
shapiro.test(rnorm (n))
shapiro.test(dataframe[,1])
shapiro.test(dataframe[,2])
# chiˆ2 Q-Q plot
d_square = diag(
as.matrix(sweep (dataframe, 2 , colMeans (dataframe))) %*%
solve(var (dataframe)) %*%
t(as.matrix(sweep (dataframe, 2 , colMeans (dataframe))))
)
car::qqPlot (d_square, dist="chisq" , df = p, id = F)
Detecting outliers (J&W Sec 4.7)
● Scatter plot of standardized values
● Check the points farthest from the origin in χ2 Q-Q plot
options(digits = 4)
# Scatter plot of standardized values
plot(1 :n, scale(dataframe[,1]), ylab= !Studentized! , xlab= !Label!); abline(2 , 0); abline(-2 , 0) which(abs(scale(dataframe[,1]))>2)
# six points farthest from the origin in $\chiˆ2$ Q-Q plot
tail(order(d_square, decreasing=F))
Improving normality (J&W Sec 4.8)
● Box-Cox transformation: for x > 0,
x* (λ) = ,xnλ(x-) 1)/λ
- If x < 0, change it to be positive first.
options(digits = 4)
install.packages(c ("car" , "EnvStats"))
λ 0
λ = 0
lambda = EnvStats::boxcox(dataframe[,1], optimize=T)$lambda
dataframe.new = (dataframe[,1] ˆlambda-1)/lambda
car::qqPlot (dataframe.new , id = F)
shapiro.test(dataframe.new)
● Exploratory data analysis (EDA)
- J. Tukey (1977). Exploratory Data Analysis. Addison-Wesley. ISBN 978-0-201-07616-5. R package “MVN”
2022-03-17