闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Homework 4 for Math 173A - Fall 2022

1. (a) Find an expression for the orthogonal projection of a point x e Rn onto the convex set

B = {z e Rn : 0 s zi s 1 for each i = 1, ..., n}.

You need to show your work, and justify your answer.

Hint: It might be helpful to sketch B, when n = 2 (i.e., in 2 dimensions), and use the sketch to help you ﬁgure out what the projection should be.

(b) Let f : Rn → R be given by

f (x) = lAxl2(2) + aT x

where A e Rn*n is a positive deﬁnite matrix, and a e Rn . Write a projected gradient descent algorithm to solve

min f (x)

xeΩ

i. Ω = B, with B from part (a).

ii. Ω = B2(n) = {z e Rn : lzl2 s 1}

You do not need to specify the step size for this problem.

2. Consider the hollow sphere S in Rn , i.e., the set S := {x e Rn : lxl2(2) = 1}. Consider the function f : Rn → R given by

f (x) = xT Qx

where Q is an n x n symmetric matrix. For this problem you may use the fact that lf (x) = 2Qx.

(a) For an arbitrary point y e Rn , Π(y) be the projection of y onto S . Find an expression for Π(y) and give a short argument (i.e., proof) for why this is the correct expression. Make sure to handle the case y = 0 (i.e., the zero vector). Hint: Recall that the projection minimizes lx - yl for y e S . One approach would be to consider the reverse triangle inequality lx - yl 2 Ilxl - lylI and ﬁnd a projection formula that achieves the global lower bound (i.e. equality instead of inequality). This would prove you’ve found the projection.

(b) Is S a convex set?

min xT Qx subject to lxl2(2) = 1.

北eRn

(d) Is the projected gradient descent algorithm guaranteed to converge to the solution for small enough µ? If not, can you give an example of Q and an initialization x(0) where the algorithm won’t converge? Hint: Consider a diagonal matrix Q where not all entries are equal.

3. We will redo the MNIST coding question from HW3 but using diﬀerent versions of gradient descent

w(t+1) = w(t) - µp(t) .

Only run these questions for diﬀerentiating 4’s and 9’s.

(a) Implement and run Lo gradient descent with step size µ = 10-8 . Run your algorithm for at least 200 iterations and initialize with w(0) = 0 (i.e. the zero

vector). Recall: Lo gradient descent uses steps

p(t) = sign(lF (w(t)))llF (w(t))l1 .

(b) Implement and run L1 gradient descent (aka. coordinate descent) with step size µ = 10-4 . Run your algorithm for at least 200 iterations and initialize with w(0) = 0 (i.e. the zero vector). Recall: L1 gradient descent uses steps

p(t) = sign(lj * F (w(t)))llF (w(t)lo ej * ,

where jo is the location of the largest entry of the gradient, and ej is the zero vector with a 1 in the jth entry.

(c) Compare these two descent plots of F (w), along with the analogous plot for gradient descent from HW3. Which performs best, and do you have an argument for why? Do you think the performance would change with diﬀerent step sizes?

2022-11-03

Java

物理(Physical)

LINUX

C++

Python

Processing

sas

ios

maths

maple

C语言