Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

EE 559

Midterm Assignment

(Week 9)

2022

Ground rules

For this assignment, you may use any of our EE 559 course materials, and you may use your computer for coding and accessing the assignment, EE 559 materials, piazza, and D2L.  You may also use the internet for information about Python, Python coding, and Python libraries that you are allowed to use for the problem you are working on (e.g., looking up information for sklearn functions that you might want to use in a problem that allows the sklearn library).

You are not allowed to use the internet to look up answers or partial answers to any of this assignment’s  problems,  or  communicate  with  other  people  (e.g.,  other  students,  people  on discussion forums, etc.) on topics related to this assignment.

For all coding in this assignment, you are allowed to use built-in Python functions, NumPy, matplotlib (and pandas only for reading/writing csv files).   Other libraries are allowed where stated so in the problem.

For questions on this assignment, when posting on piazza, please consider whether this is a question  that  is  appropriate  for  all  students  (e.g.,  clarifying  the problem  statement,  what  is allowed, suspected typos or omissions in the problem statement), or only for the instructors (e.g., something that includes your approach or solution) which should be posted as a “private” post that is visible only to the professor, TAs and you.

Please respect your classmates and follow all of these guidelines, so that everyone can be graded fairly.  Any violations that are detected will result in substantial penalties.

Uploading your solutions.   Please upload the following files before the due date and time:

1.    A single pdf file of all your solutions/answers.

2.    A single computer-readable pdf file of all your code.

Problems and points.   This Midterm Assignment has 3 problems, worth a total of 100 points possible.  There will be partial credit on most problem parts.  Good luck!

1.    Kernel nearest means.

In this problem you will write code and run a nonlinear version of a nearest means classifier, on data that we provide.  This problem has  = 2 features.

Coding:  you  must  use  Python,  and  code  the  functions  yourself.     sklearn  and  other supplementary libraries are not allowed for this problem.  Similar to Homework 4, you may use Python built-in functions, NumPy, and matplotlib.  You may also use plotting functions provided in our previous homework assignments, and modify them as you like.  Pandas may be used only for reading and writing csv files.

Datasets: there are 2 synthetic datasets provided.  Pr1_dataset1 is the synthetic dataset from HW1, but divided up to include a validation set.  Pr1_dataset2 is also synthetic but is new. Each of these 2 datasets is given as 3 sets:  training set, validation set, and test set.                  Comment:  This problem uses model selection with a validation set (covered in Lecture 13); the procedure is straightforward, and we will answer any questions you have regarding the model selection procedure.

(a)   Write an expression for the discriminant function  %  '  for a 2-class nearest-means

(b)   From (a), develop and give the dual representation of the discriminant function; call it  ! %  '.

Hint:  Lecture 12 for the multiclass (MVM) version.

(c)   Use kernel substitution to give an expression for the discriminant function  "#$ %  ' for

12).

(d)  Code a 2-class kernel nearest means classifier that can use RBF kernel or linear kernel. Use model selection with the validation set to find an optimal value of  (call it  ∗ ) in  the case of RBF kernel, separately for each dataset.

Tip:  not knowing what range is best for  a priori, you might first try a range covering a few orders of magnitude, and use a log scale for the increment:  e.g., to cover

0.01 ≤  ≤ 100, one could use a loop over the index k in the range − 2  ≤   ≤ 2 with some increment (step size) specified for k.  Then the value of  for each k can be           computed from  = 10&  .

(e)   Plot the validation-set classification error as a function ofγ , for each dataset, for RBF

kernel.  (If you used a log scale for your increments as described in the tip of (d), then use a log scale for  in your plot; or equivalently, plot vs. k instead ofvs. .)  Pick the optimal value  ∗  for each dataset based on validation-set classification error.

(f)   Compare test-set error using the linear kernel with test-set error using the RBF kernel,

for each dataset.  Comment on the results.

(g)   For the linear kernel, plot the training data, decision regions and boundary, in the

feature space, for each dataset.

(h)   For the RBF kernel with optimal γ , plot the training data and decision regions in the

original feature space, for each dataset.

Tip:  any point in the original feature space can be classified (using nearest-means with RBF kernel) from your code.

(i)   For the RBF kernel, repeat part (h) except for

 = 0.01∗ , 0.1∗ , 0.3∗ ,  3∗ ,  10∗ ,  100∗ .  Comment on the results.

 

2.    Learning algorithm from criterion function.

In this problem you will derive and interpret a learning algorithm based on the following criterion function in a 2-class problem, using augmented notation:

(

 % ' = − 5 67 %' ' ≤ 09:  '

')*

in which 7[ … ]9 denotes the indicator function.

(a)   Interpret the meaning of the criterion function; use a figure if it helps explain it.

Tip: just stating what is different from Perceptron criterion function is not sufficient.

(b)   Derive a gradient descent (GD) algorithm for each GD method given below.   Simplify

your answers algebraically.  Give a statement of the entire algorithm. Give both:

(i)        Batch GD                          (ii)       Stochastic GD (Variant 1)

(c)   In this part use nonaugmented notation.   Consider  a problem with   = 2  features. Draw a plot in 2D weight space ( . * ), for the case -  = 0, showing items (i)-(v) below.  Label your axes with numbers and tick marks, so the lines and vectors drawn are unambiguous.

Let () =  .  For the first epoch, the data has already been shuffled so take the data points in the order given ( *,  ,,   0, 1 ).

Tip: This plot can be done by hand.  If you prefer to do it by computer, that is also OK.

(i)    The following data points (as lines 2!3! %  ' = 0 ), with an arrow pointing to the

correctly classified side.

* :    *  = [ −4,1]+ ,   ,  = [ − 3, − 2]+

, :   0  = [ − 1,3]+ ,  1  = [3, − 2]+

(ii)  The weight vector at the second iteration ( = 1) :

  (1) = [ − 1,4]+ .

(iii) The vector showing the weight update at iteration  = 1 (based on data point   , ). Also state the vector’s value in numbers.

(iv) The updated weight vector  (2).  Also give the vector’s value in numbers.

(v)  The solution region; if there is no solution region, state so.

(d)  This part also uses nonaugmented notation.  For the scalar case,  =  , '  = '  , with -  = 0 :  is the criterion function convex?  Prove your answer.