关键词 > COMP3670/6670
COMP3670/6670: Introduction to Machine Learning
发布时间:2022-09-15
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Semester 2, 2022
COMP3670/6670: Introduction to Machine Learning
Exercise 1 Conjectures 5 credits each
Here are a collection of conjectures. Which are true, and which are false?
● If it is true, provide a formal proof demonstrating so.
● If it is false, give a counterexample, clearly stating why your counterexamples satisfies the premise but not the conclusion.
(No marks for just starting True/False.)
Hint: There’s quite a few questions here, but each is relatively simple (the counterexamples aren’t very complicated, and the proofs are short.) Try playing around with a few examples first to get an intuitive feeling if the statement is true before trying to prove it.
Let V be a vector space, and let (., .) : V · V - R be an inner product over V .
1. Triangle inequality for inner products: For all a, b, c e V , (a, c) < (a, b) + (b, c) .
2. Transitivity of orthogonality: For all a, b, c e V , if (a, b) = 0 and (b, c) = 0 then (a, c) = 0.
3. Orthogonality closed under addition: Suppose S = {v1 , . . . , vn } C V is a set of vectors, and x is orthogonal to all of them (that is, for all i = 1, 2, . . . n, (x, vi ) = 0). Then x is orthogonal to any y e Span(S).
4. Let S = {v1 , v2 , . . . , vn } C V be an orthonormal set of vectors in V . Then for all non-zero x e V , if for all 1 < i < n we have (x, vi ) = 0 then x e/ Span(S).
5. Let S = {v1 , v2 , . . . , vn } C V be a set of vectors in V (no assumption of orthonormality). Then for all non-zero x e V , if for all 1 < i < n we have (x, vi ) = 0 then x e/ Span(S).
6. Let S = {v1 , . . . , vn } be a set of orthonormal vectors such that Span(S) = V , and let x e V . Then there is a unique set of coefficients c1 , . . . , cn such that
x = c1 v1 + . . . + cn vn
7. Let S = {v1 , . . . , vn } be a set of vectors (no assumption of orthonormality) such that Span(S) = V , and let x e V . Then there is a unique set of coefficients c1 , . . . , cn such that
x = c1 v1 + . . . + cn vn
8. Let S = {v1 , v2 , . . . , vn } C V be a set of vectors. If all the vectors are pairwise linearly independent (i.e., for any 1 < i j < n, then only solution to ci vi + cj vj = 0 is the trivial solution ci = cj = 0.) then the set S is linearly independent.
Exercise 2 Inner Products induce Norms 20 credits
Let V be a vector space, and let (., .) : V · V - R be an inner product on V . Define IIxII := ^(x, x) . Prove that II . II is a norm.
(Hint: To prove the triangle inequality holds, you may need the Cauchy-Schwartz inequality, (x, y) < IIxIIIIyII.)
Exercise 3 General Linear Regression with Regularisation (10+10+10+5+5 credits) Let A e RN ×N , B e RD×D be symmetric, positive definite matrices. From the lectures, we can use symmetric positive definite matrices to define a corresponding inner product, as shown below. We can also define a norm using the inner products.
(x, y)A := xT Ay
|x|A(2) := (x, x)A
(x, y) B := xT By
|x|B(2) := (x, x) B
Suppose we are performing linear regression, with a training set {(x1 , y1 ), . . . , (xN , yN )}, where for each i, xi e RD and yi e R. We can define the matrix
x = [x1 , . . . , xN ]T e RN ×D
and the vector
y = [y1 , . . . , yN ]T e RN .
We would like to find 9 e RD , c e RN such that y s X9 + c, where the error is measured using | . |A . We avoid overfitting by adding a weighted regularization term, measured using II. IIB . We define the loss function with regularizer:
cA ,B ,y ,X (9 , c) = IIy - x9 - cIIA(2) + II9IIB(2) + |c|A(2)
For the sake of brevity we write c(9 , c) for cA ,B ,y ,X (9 , c).
HINTS:
● You may use (without proof) the property that a symmetric positive definite matrix is invertible.
● We assume that there are sufficiently many non-redundant data points for X to be full rank. In particular, you may assume that the null space of X is trivial (that is, the only solution to Xz = 0 is the trivial solution, z = 0.)
● You may use identities of gradients from the lectures slides, so long as you mention as such.
1. Find the gradient V9 c(9 , c).
2. Let V9 c(9 , c) = 0, and solve for 9 . If you need to invert a matrix to solve for 9, you should prove the inverse exists.
3. Find the gradient Vc c(9 , c).
We now compute the gradient with respect to c.
4. Let Vc c(9) = 0, and solve for c. If you need to invert a matrix to solve for c, you should prove the inverse exists.
5. Show that if we set A = I, c = 0, B = λI, where λ e R, your answer for 3.2 agrees with the analytic solution for the standard least squares regression problem with L2 regularization, given by
9 = (xT x + λI)_ 1 xT y.