Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

CSCI 567, Summer 2022

Theory Assignment 2

Problem 1  Kernel function                                                                                 (32 points)

Recall from lecture that there are two definitions of a kernel function, k(x, x).

1. First k is called a kernel function if there exists a basis function ϕ : RD  → RM such that k(x, x′) = ϕ(x)T ϕ(x′).

2. Second, we have Mercer’s Theorem which states that k is a kernel function if and only if, for any set of x1, x2, · · · , xn RD, the resulting Gram matrix is PSD.

Throughout this problem, you can use either of these definitions to check or prove that some function is a valid kernel.

1.1  Consider the function k(x, x′) = xTx′ + (xTx)2 over x ∈ R2 . Is this a valid kernel function? Show why or why not.                                                                                                                                   (10 points)

1.2  Consider the function k(x, x′) = (f(x) + f(x′))2 for any function f  : RD  → R. Is this a valid kernel function? Show why or why not.                                                                                                  (12 points)

1.3  Now, assume k1 (x, x′) and k2 (x, x) are kernel functions. Prove by the Mercer Theorem (from lecture 5) that a linear combination k(x, x′) = αk1 (x, x′) + βk2 (x, x′) for some α, β ≥ 0 is also a kernel function.     (10 points)

Problem 2  Support Vector Machines

(32 points)

Consider the dataset consisting of points (x, y), where x is a real value, and y ∈ {−1, 1} is the class label. Let’s start with three points (x1, y1 ) = (−1, −1), (x2, y2 ) = (1, −1), (x3, y3 ) = (0, 1).

 

Figure 1: Three data points considered in Problem 2

2.1  Can three points shown in Figure 1, in their current one-dimensional feature space, be perfectly sepa- rated with a linear separator? Why or why not?                                                                             (4 points)

2.2  Now we define a simple feature mapping ϕ(x) = [x, x2]T to transform the three points from one- to two-dimensional feature space. Plot the transformed points in the new two-dimensional feature space. Is there a linear decision boundary that can separate the points in this new feature space? Why or why not?

(4 points)

2.3  Given the feature mapping ϕ(x) = [x, x2]T, write down the 3 × 3 kernel (or Gram) matrix K for the three data points. Show that this Gram matrix is positive semi-definite. Write the Kernel function K(x,y)(defined as K(x, y) = ϕ(x)T ϕ(y)).                                                                                                                 (8 points)

2.4  Write down the dual formulation of this problem (plugging in the numerical values evaluated using

the kernel function).

(8 points)

2.5  Solve the dual form analytically. Then obtain primal solution w , busing dual solution.      (8 points)

Problem 3  Constrained Optimization                                                               (36 points)

Machine learning problems, especially clustering problems, sometimes involve optimization over a sim- plex. In this exercise, you will solve two optimization problems over the simplex. Recall a K − 1 dimen- sional simplex ∆ is defined as:

= {q RK|qk 0, k and qk = 1},

which means that a K − 1 dimensional simplex has K non-negative entries, and the sum of all K entries is 1. This property coincides with the property of the probability distribution of a discrete random variable of K possible outcomes. Thus, the simplex is usually seen in clustering problems.

3.1  Given a1, ..., aK   ∈ R 0 (the set of non-zero real numbers), solve the following optimization over the simplex. (find the optimal value q of q)                                                                                      (18 points)

arg max  ak(2) ln qk

(a) Write down the Lagrangian of this problem. (Hint: use the constraints on qkgiven by the simplex ∆) (4 points)

(b) Apply KKT conditions on the Lagrangian you derived above to find q . (Hint:  the solution can be

written in the form of qk(∗) = ...)

(12 points)

(c) The solution you acquired will not have a simple form if ak is allowed to be 0.  Explain why.  (Hint: point out the relevant variable. One sentence explanation is sufficient)                                   (2 points)

3.2  Next given c1, ..., cK  ∈ R, solve the following optimization problem following the same steps in part 1.1. q is under the same constraints as in part 1.1:                                                                         (18 points)

arg max  (qkck− qkln qk)