Statistical Machine Learning (GR5241) Homework 1
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Homework 1
Statistical Machine Learning (GR5241)
1. (3 pt) Let y e Rn×1 , β e Rp×1 and X e Rn×p. We use T to indicate the transpose. The rows of X are designated with xi(T) e R1×p for i = 1, . . . , n and the columns of X are designated with φj e Rn×1 for j = 1, . . . , p. In other words,
xi(T) = [Xi1, Xi2, . . . , Xip]
φj(T) = [X1j, X2j, . . . , Xnj ] ,
where Xij stands for the entry at the ith row and jth column of X. Finally let yi be the ith entry of y for i = 1, . . . , n. The residual sum of squares is
n
L(β) = (yi _ xi(T)β)2 .
i=1
a. Show that L(β) is equal to |y _ Xβ|2(2) .
b. Show that is equal to _2φj(T) (y _ Xβ).
c. Show that VL(β) is equal to _2XT (y _ Xβ) where VL(β) is the gradient of L(β) with respect to β. Note that the gradient is a vector composed of the partial derivatives of L(β) with respect to βj for j = 1, . . . , p, so naturally VL(β) e Rp×1 .
2. (5 pt) Consider the following least squares problem:
βˆ = arg min |y _ Xβ|2(2) . (1)
(a) Show that if v e Rp is a vector such that Xv = 0, then βˆ + c.v is also a minimizer of the least squares problem, for any c e R.
(b) If the columns of X are linearly independent, then what vectors v e Rp satisfy Xv = 0?
(c) If the columns of X are linearly independent, then what vector βˆ is the minimizer of the least squares problem?
(d) Suppose that p > n. Show that there are infinitely many linear regression estimates, and that depending on which estimate we select, some coefficients can have positive or negative signs.
(e) Suppose that p > n. Show that there are infinitely many linear regression estimates, and for all of them, y = X βˆ .
3. (3 pt) Part a. of problem 2.5 in Elements of Statistical Learning. Assume y = Xβ + e, where y e Rn×1 , β e Rp×1 , X e Rn×p and e ~ N (0, σ2 I).
4. (3 pt) Consider the following problem:
βˆ = arg min |θ|2(2)
θeR_
subject to y = Xθ,
where X e Rn×p and p > n. Show that βˆ = X+y .
5. (3 pt) Problem 12.10 from the following book: Bishop, C. Pattern Recognition and Ma- chine Learning. Springer-Verlag, 2006.
2022-02-21