关键词 > COMP3670/6670

COMP3670/6670: Introduction to Machine Learning

发布时间：2022-09-15

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Semester 2, 2022

COMP3670/6670: Introduction to Machine Learning

Exercise 1 Conjectures 5 credits each

Here are a collection of conjectures. Which are true, and which are false?

● If it is true, provide a formal proof demonstrating so.

● If it is false, give a counterexample, clearly stating why your counterexamples satisﬁes the premise but not the conclusion.

(No marks for just starting True/False.)

Hint: There’s quite a few questions here, but each is relatively simple (the counterexamples aren’t very complicated, and the proofs are short.) Try playing around with a few examples ﬁrst to get an intuitive feeling if the statement is true before trying to prove it.

Let V be a vector space, and let (., .) : V · V - R be an inner product over V .

1. Triangle inequality for inner products: For all a, b, c e V , (a, c) < (a, b) + (b, c) .

2. Transitivity of orthogonality: For all a, b, c e V , if (a, b) = 0 and (b, c) = 0 then (a, c) = 0.

3. Orthogonality closed under addition: Suppose S = {v1 , . . . , vn } C V is a set of vectors, and x is orthogonal to all of them (that is, for all i = 1, 2, . . . n, (x, vi ) = 0). Then x is orthogonal to any y e Span(S).

4. Let S = {v1 , v2 , . . . , vn } C V be an orthonormal set of vectors in V . Then for all non-zero x e V , if for all 1 < i < n we have (x, vi ) = 0 then x e/ Span(S).

5. Let S = {v1 , v2 , . . . , vn } C V be a set of vectors in V (no assumption of orthonormality). Then for all non-zero x e V , if for all 1 < i < n we have (x, vi ) = 0 then x e/ Span(S).

6. Let S = {v1 , . . . , vn } be a set of orthonormal vectors such that Span(S) = V , and let x e V . Then there is a unique set of coeﬃcients c1 , . . . , cn such that

x = c1 v1 + . . . + cn vn

7. Let S = {v1 , . . . , vn } be a set of vectors (no assumption of orthonormality) such that Span(S) = V , and let x e V . Then there is a unique set of coeﬃcients c1 , . . . , cn such that

x = c1 v1 + . . . + cn vn

8. Let S = {v1 , v2 , . . . , vn } C V be a set of vectors. If all the vectors are pairwise linearly independent (i.e., for any 1 < i j < n, then only solution to ci vi + cj vj = 0 is the trivial solution ci = cj = 0.) then the set S is linearly independent.

Exercise 2 Inner Products induce Norms 20 credits

Let V be a vector space, and let (., .) : V · V - R be an inner product on V . Deﬁne IIxII := ^(x, x) . Prove that II . II is a norm.

(Hint: To prove the triangle inequality holds, you may need the Cauchy-Schwartz inequality, (x, y) < IIxIIIIyII.)

Exercise 3 General Linear Regression with Regularisation (10+10+10+5+5 credits) Let A e RN ×N , B e RD×D be symmetric, positive deﬁnite matrices. From the lectures, we can use symmetric positive deﬁnite matrices to deﬁne a corresponding inner product, as shown below. We can also deﬁne a norm using the inner products.

(x, y)A := xT Ay

|x|A(2) := (x, x)A

(x, y) B := xT By

|x|B(2) := (x, x) B

Suppose we are performing linear regression, with a training set {(x1 , y1 ), . . . , (xN , yN )}, where for each i, xi e RD and yi e R. We can deﬁne the matrix

x = [x1 , . . . , xN ]T e RN ×D

and the vector

y = [y1 , . . . , yN ]T e RN .

We would like to ﬁnd 9 e RD , c e RN such that y s X9 + c, where the error is measured using | . |A . We avoid overﬁtting by adding a weighted regularization term, measured using II. IIB . We deﬁne the loss function with regularizer:

cA ,B ,y ,X (9 , c) = IIy - x9 - cIIA(2) + II9IIB(2) + |c|A(2)

For the sake of brevity we write c(9 , c) for cA ,B ,y ,X (9 , c).

HINTS:

● You may use (without proof) the property that a symmetric positive deﬁnite matrix is invertible.

● We assume that there are suﬃciently many non-redundant data points for X to be full rank. In particular, you may assume that the null space of X is trivial (that is, the only solution to Xz = 0 is the trivial solution, z = 0.)

● You may use identities of gradients from the lectures slides, so long as you mention as such.

1. Find the gradient V9 c(9 , c).

2. Let V9 c(9 , c) = 0, and solve for 9 . If you need to invert a matrix to solve for 9, you should prove the inverse exists.

3. Find the gradient Vc c(9 , c).

We now compute the gradient with respect to c.

4. Let Vc c(9) = 0, and solve for c. If you need to invert a matrix to solve for c, you should prove the inverse exists.

5. Show that if we set A = I, c = 0, B = λI, where λ e R, your answer for 3.2 agrees with the analytic solution for the standard least squares regression problem with L2 regularization, given by

9 = (xT x + λI)_ 1 xT y.