关键词 > MATH-UA252/MA-UY3204

MATH-UA 252/MA-UY 3204 - Fall 2022 - Homework #1

发布时间：2022-09-20

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

MATH-UA 252/MA-UY 3204 - Fall 2022 - Homework #1

Problem 1 (computing the gradient and Hessian of the linear least squares cost function using two different methods). A quadratic form is a scalar-valued function f : Rn → R of the form:

f (x) = x⊤Ax + b⊤x + c, (1)

where x, b ∈ Rn , A ∈ Rn×n, and c ∈ R.

(a) Rewrite the squared ℓ2 norm ∥Cy + d∥2(2), where C ∈ Rm ×n , y ∈ Rn , and d ∈ Rm , as a quadratic form g(y). What is the dimension of the domain of g?

(b) Write g(y) out using summations instead of matrix notation. Using this form of g, compute the first and second partials of g : ∂g/∂yi and ∂2 g/∂yi ∂yj for each i and j .

(d) Recall that—in general—the Taylor expansion of a function g about y with base point h is given by: g(y + h) = g(y) + ∇g(y)⊤ h + h⊤ ∇2 g(y)h + O(∥h∥2(3)), (2)

where ∇g(y) denotes the gradient of g at y and ∇2 g(y) denotes its Hessian. Evaluate the quadratic form from part (a) at y + h. Simplify the resulting expression by collecting terms according to their power of h. You should get a constant term (of the form“D(y)”), a linear term (of the form“E(y)h”), and a quadratic term (of the form“h⊤F (y)h”). Once you have determined E(y) and F (y), match these expressions with the linear and quadratic terms of (2) in order to find ∇g(y) and ∇2 g(y) indirectly.

Problem 2 (experimenting with gradient descent). You now should know that the gradient is the direction of steepest ascent at a point. That is, for a scalar-valued function f : Rn → R, ∇f (x) is the vector which points in the direction of fastest increase at x, along with the rate of change in f . This gives us an idea for a simple iterative method for minimizing functions. With an iterative method for optimization (really, the main focus of this course!), the idea is to generate a sequence of vectors x0 , x1 , x2 , . . . such that f (xn) → f (x∗ ) = min北 f (x) as n → ∞. The idea of gradient descent is simple: let x0 be an initial guess for x∗ (it could be completely arbitrary). For each n, we set:

xn+1 = xn − ∇f (xn). (3)

That is, we take a step in the direction of steepest decrease with magnitude |∇f (xn)|. We learn about more sophisticated variations on this idea during the course. In the mean time, we will begin to experiment with gradient descent in this problem.

Consider the cost function:

f (x, y) = (1 − cos(πx/2))y2 , (x, y) ∈ B = [−1, 1] × [−1, 1]. (4)

(a) This function does not have a unique minimizer over B . What is its set of minimizers over B , and what value of f is obtained there?

(b) Use Python with numpy and matplotlib to make a contourf plot of f over B . You may find the np.meshgrid and np.linspace functions helpful.

(c) Generate random initial iterates r0 = (x0 , y0 ) in B using np.random.uniform and take 10 gradient descent steps. Use plot to plot each of these trajectories as piecewise linear paths on top of your contourf plot. Make sure set marker='.' and linewidth=1 (or smaller) to make it easy to see where each iterate rn lies on the trajectory.

(d) Take a look at several of these trajectories—give an explanation for any patterns you might observe.

(e) Generate a large number of these trajectories (1000 of them, for example), and plot them all simulta- neously. What do you notice?