闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Math 123 Problem Set 6

Instruction: Read the homework policy. You should submit a PDF copy of the homework and any associated codes on Canvas. Your PDF must be a single ﬁle, not multiple images.

1. In this problem, we consider the least squares problem.

(a) Prove that the least squares loss f (x) = ||Ax − b||2(2) is a convex function. (b) Sketch the gradient descent algorithm to solve the least squares problem.

2. Consider the following minimization problem

min xT Cx + bT x + d

_eRn

where C is an n × n symmetric matrix, b ∈ Rn is a ﬁxed vector and d ∈ R is a constant. (a) Derive the gradient of the objective function.

(b) Under what conditions on C is the objective convex?

3. Consider the optimization problem corresponding to the soft SVM problem,

min ||w||2 + C ξi subject to yi (wT xi + b) ≥ 1

i=1

We consider the eﬀect of the regularization parameter C .

(a) What kind of problem do we obtain as C → ∞? Justify your answer.

(b) For very small C , what could you say about the margin? Brieﬂy explain your answer.

4. [Extra Credit: 10 points] Given n points and their labels (xi , yi ) for 1 ≤ i ≤ n with each xi ∈ Rd , consider the dual optimization program corresponding to the hard margin support vector machines classiﬁcation problem.

max αi>0,i=1,...,nαi − αj αk yj yk xj(T)xk subject to αiyi = 0

i=1 j,k i=1

Let {αi(*)} denote the optimal solution to the above program. Assume that there exists a point xs such that ys (w*xs + b* ) = 1 where ys is the label of xs and w* and b* denote the optimal solutions to the primal optimization program.

(a) What is the form of the classiﬁer function in terms of {αi(*)} and {(xi , yi ), 1 ≤ i ≤ n}? [Remark: The classifer function should only depend on the dual variables and the data points and their labels. It should not be given in terms of b* and w*].

(b) Suppose that we employ kernel for the support vector machines classiﬁcation problem em-

ploying the Gaussian kernel K(x, ) = exp / 、. What is the form of the classiﬁer function in terms of {αi(*)} and {(xi , yi ), 1 ≤ i ≤ n} and the Gaussian kernel?

[Remark: The classifer function should only depend on the dual variables and the data points and their labels. It should not be given in terms of b* and w*].

(c) Let n = 1000. Assume that α3(*) = 0. Suppose we remove the third training data point (x3 , y3 ). How does the classiﬁer function change? Justify your answer.

(d) Characterize the support vectors in terms of {αi(*)} .

(e) What is the computational complexity of evaluating the classiﬁer function in (b) given a test data point xtest ? The complexity should be in terms of n and d.

5. [Extra Credit: 20 points] Given n points and their labels (xi , yi ) for 1 ≤ i ≤ n with each xi ∈ Rd , consider the optimization program corresponding to the hard margin support vector machines classiﬁcation problem.

w e(i)nRd wT w subject to yj (wT xj + b) ≥ 1 for j = 1, ..., n (1)

Let d = 2. Let x1 = (0, 0), x2 = (2, 2), x3 = (2, 0) and x4 = (3, 0) be the training points with labels y1 = −1, y2 = −1, y3 = 1 and y4 = 1 respectively.

(a) Derive the optimal solution to (1) i.e. ﬁnd the equation of the optimal hyperplane. (b) Based your solution in (a), ﬁnd an explicit form for the classiﬁer function. (c) Using the classiﬁer in (b), ﬁnd the labels of the test points (6, 2) and (1, 1). (d) Compute the margin.

(e) What are the support vectors?

(f) Consider we remove the data point x4 = (3, 0). How does the solution of the optimization problem change? In particular, what is the form of the optimal hyperplane? Justify your answer.

[Remark: Show detailed work. Answers that simply guess the the equation of the optimal hyperplane receive no credit for (a). To get full credit, you need to solve the problem by considering the optimization objective and the constraints in (1)].

[Remark: For part (a) and (b), you can directly use the relations/results that can be established from the equivalence of the primal hard SVM problem and dual SVM problem. ]

6. [Extra Credit: 30 points] Given n points and their labels (xi , yi ) for 1 ≤ i ≤ n with each ξi ∈ R2 , consider the dual optimization program corresponding to the soft margin support vector machines classiﬁcation problem.

xξ(i)0 wT w + C ξi such that yi (wT xi + b) ≥ 1 − ξi ∀(xi , yi )

i=1

C is a regularization constant that controls overﬁtting.

(a) What is the Lagrangian L corresponding to the above optimization problem. L should be a function of w, b and α . α = {αi } are the penalty parameters.