闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

EE 559

Homework 6

2022

1. This problem is to be done by hand. In the parts below, you will set up the primal Lagrangian and derive the dual Lagrangian, for support vector machine classifier, for the case of data that is linearly separable in expanded feature space.

For the SVM learning, we stated the optimization problem as:

Minimize J(w)= w2

s.t. zi (wT ui+ w0 ) − 1 ≥ 0 ∀i

Assume our data is linearly separable in the expanded feature space. Also, nonaugmented notation is used throughout this problem.

(a) If the above set of constraints (second line of equations above) is satisfied, will all the

training data be correctly classified?

(b) Write the Lagrangian function L ( w, w0 ,λ) for the minimization problem stated above. Use

λi , i = 1, 2, !, N for the Lagrange multipliers. Also state the KKT conditions. (Hint: there are 3 KKT conditions).

(i) Minimize L w.r.t. the weights.

Hint: solve ∇ wL = 0 for the optimal weight w* (in terms of λi and other variables);

∂ L

(ii) Substitute your expressions from part (i) into L , and use your expression from

∂ L

LD (λ) = − λiλjzizj uiT uj + λi

subject to the (new) constraint: λizi = 0 , which becomes a new KKT condition.

Also give the other two KKT conditions on λi , which carry over from the primal form.

2. In this problem you will do SVM learning on a small set of given data points, using the result from Problem 1 above. Problem 2 also uses nonaugmented notation.

Coding. This problem involves some work by hand, and some solution by coding. As in previous homework assignments, in this problem you are required to write the code yourself; you may use Python built-in functions, NumPy, and matplotlib; and you may use pandas only for reading and/or writing csv files. For the plotting, we are not supplying plotting function(s) for you; may use this as an opportunity to (learn, explore, and) use matplotlib functions as needed. Alternatively, you may do the plots by hand if you prefer. Throughout this problem, let the dataset have = 3 points. We will work entirely in the expanded feature space ( -space).

(a) Derive by hand a set of equations, with λ , as variables, and ! , ! , = 1,2,3 as given

constants, that when solved will give the SVM decision boundary and regions. To do this, start the Lagrangian process to optimize LD w.r.t. λ and , subject to the equality

constraint λizi = 0 . Set all the derivatives equal to 0 to obtain the set of equations. Hint: Start from LD ′ (λ, µ) = λi − λiλjzizjuiT uj + µziλi , in which the

last term has been added to incorporate the equality constraint stated above.

Finally, formulate your set of equations as a matrix-vector equation of the form:

in which ρ = 1λ" , µ3" . Give your expressions for and .

Tip: If you’d like a quick check to see if you’re on the right track, try plugging in (by hand) for this simple dataset:

# = 4 60(1) , $ = 4 60(0) ∈ # (# = $ = 1); % = 4 61(1) ∈ $ ( % = − 1). Thefirstrowof should be [1 0 − 1 − 1] and firstentryof shouldbe1.

In the parts below you will use a computer to solve this matrix equation for λ and µ , calculate then plot and interpret the results, for 3 different datasets.

For parts (b)-(e) below, use the following dataset:

# = 4 62(1) , $ = 4 61(2) ∈ #

% = 4 60(0) ∈ $

For all parts below, consider # to be the positive class ( ! = +1).

(b) Write code in Python to solve for λ and µ , calculate ∗ and ' , and check the KKT

conditions.

Tip: Use variables ! , ! , = 1,2,3 such that you can input their values for different datasets.

Specifically, the code should:

(i) Use NumPy to invert your matrix, and to calculate the resulting values for λ and µ . (ii) Check that the resulting λ satisfies the KKT conditions involving λ (but not

involving ) that you stated in Problem 1(c)(ii).

(iii) Calculate the optimal (nonaugmented) weight vector ∗ by using your result from

Problem 1(c)(i). And, find the optimal bias term ' using one of the KKT conditions from Problem 1(b).

(iv) Check that the resulting and ' satisfy the KKT conditions on and ' of Pr.

1(c).

(v) Output the results of (i)-(iv) above.

(c) Run your code on the given dataset; give the resulting values for λ and µ , and the results of the KKT-conditions check on λ .

Also give the resulting ∗ and ' , and the results of the KKT-conditions check on and

' .

(d) Plot in 2D nonaugmented feature ( ) space: the data points showing their class labels, the decision boundary defined by ∗ and ' , and an arrow showing which side of the boundary

is class # . While you are encouraged to do (some or all of) the plot by computer, you can do some or all of it by hand if you prefer, as long as it is neat and reasonably accurate.

(e) Looking at the plot, does the decision boundary correctly classify the training data?

And if yes, does it look like the maximum-margin boundary that will correctly classify the data? Briefly justify.

(f)(i)-(iii) Repeat parts (c) – (e) except for the following dataset:

# = 4 62(1) , $ = 4 61(2) ∈ #

% ' = 4 61(1) ∈ $

in which the third data point has been changed.

(iv) Explain the change, or lack of change, in the decision boundary from (d) to (f).

(g) (i) How do you think the boundary will change (relative to (f)) if we instead use the

following data:

# = 4 62(1) , $ = 4 61(2) ∈ #

% '' = 4 1056 ∈ $

in which the third data point has (yet again) been changed?

(ii)-(iv) Try it by repeating (c)-(e) except with these data points.

(v) Explain any change or lack of change in the decision boundary as compared with (d) and (f).

Comment: this method of implementation (Pr. 2 using matrix inverse) is useful for working with algebraic expressions, theory, and seeing/verifying how things work. For larger datasets, typically other implementation methods are used such as quadratic programming or variations on gradient descent designed for this type of optimization.