闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Homework 3 for Math 173A - Fall 2022

1. Using the conditions of optimality, ﬁnd the extreme points of the following func- tions and determine whether they are maxima or minima. You may use a computer to ﬁnd the eigenvalues, but these questions should have easily accessible eigenvalues by hand.

(a) f : R2 → R for f (x1 , x2 ) = x1(4) + 2x2(4) _ 4x1x2

(b) f : R3 → R for f (x-) = x-T Ax- + bT x-, where

A = ┌ ┐ ┌ ┐1(0)

2. Determine whether each function is Lipschitz, and if so ﬁnd its Lipschitz constant. For all problems, | . | represents the Euclidean norm (2-norm).

(a) f : Rn → R for f (x) = |x|

(b) f : Rn → R for f (x) = |x|2

(d) f : Rn → R for f (x) = ρ(wT x + b) for some weight vector w e Rn , b e R, and ρ from part (c).

3. Let f be a convex, diﬀerentiable, L_Lipschitz function where L = 3. Let x ≠ be the global minimum and suppose x(0) is the initialization such that |x≠ _ x(0)| s 3.

(a) Determine the number of steps needed to satisfy

f ╱ x(k)\ _ f (x≠ ) s 10≥5 .

(b) What is the associated choice of step size µ?

4. Coding Question: For this problem, you will need to download the MNIST data set. It is a data set of images of size 28x28 pixels. Each one is an image of a handwritten digit from 0-9. You may

ﬁnd it from the original source here:

http://yann.lecun.com/exdb/mnist/

or in convenient .csv ﬁle format here:

https://pjreddie.com/projects/mnist-in-csv/

In the last assignment (Problem 4) you considered the classiﬁcation problem with logistic regression

F (w) = log ╱ 1 + e≥ìw,xi|yi、. (1)

i=1

You also wrote down a gradient descent algorithm for it.

In this assignment, you’ll implement what you learned to classify MNIST digists. For all questions, you must submit your code and the requested answers. Your code must be present to receive points.

(a) Display one randomly selected image from your training data for each digit class. Provide the index number for each image.

(b) Select the ﬁrst 500 examples of 0’s and 1’s for this example, those will form the training data (xi, yi) e R784 x {_1, 1}, i = 1, ..., 1000. Assign label yi = 1 for 1s and yi = _1 for 0s.

Note: To get from images of size 28 x 28 pixels to vectors in R784, you just need to “vectorize” the image. This means you can concatenate each of the 28 columns of the original image into one long vector of length 784. In Matlab, this is done with the command x(:), simllarly in Python it’s numpy.vectorize(x).

i. Implement and run a Gradient Descent algorithm, with step-size µ = 10≥6, to optimize the function (1) associated with this setup. You should run your algorithm for at least T = 200 iterations (but if your computer can handle it, do more, until a reasonable stopping criterion is satis ed), and provide a plot showing the value of F (w) at each iteration. Also, feel free to adjust µ to be larger / smaller if the plot does not match your expectations.

ii. Comment on the resulting plot. In particular, does the value of F (w) decrease with every iteration? Does your algorithm seem to be converging to a ﬁxed w≠ ? Explain whether your answers to these questions are consistent with the theory we discussed in class (and in the notes). Be speciﬁc i.e., point to a speciﬁc theorem (or theorems) and indicate why it does or does not explain the behavior of the algorithm. Would the theory dictate a diﬀerent choice of µ?

iii. Now, use the w you found from part (a) to classify the ﬁrst 500 test data points associated to each of the 0 and 1 handwritten digits. Recall that you need to use the function y = sign(wT x) to classify. What was the classiﬁcation error rate associated with the two digits on the test data (this should be a number between 0 and 1)? What was it on the training data?

(c) Repeat parts (b)i. and (b)iii. for digits of 4s and 9s. Comment on the diﬀerence between the results and propose a reason as to why the performance did or did not change.