闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Homework 3

2022

Part 1. We will work through some details on logistic regression. We ﬁrst set some notation. Let

xi e Rp , yi e {0, 1} for i = 1, . . . , n. Let X e Rn ×p denote the matrix with xi as its ith row. Recall that the negative log-likelihood f(β) can be written as follows:

f(β) = ┌ -yixi(T)β + log(1 + exp(xi(T)β))┐ .

i=1

1. Write the gradient and Hessian of f(β).

2. What is the computational complexity for a calculating the gradient and Hessian of f(β)?

3. Under what condition is f(β) strictly convex?

4. Prove that f(β) is L-Lipschitz diﬀerentiable with L = |X|o(2)p .

5. Suppose that there is a vector w e Rp such that xi(T)w > 0 if yi = 1 and xi(T)w < 0 if yi = 0. Prove that

f(β) does not have a global minimum. In other words, when the classiﬁcation problem is completely

separable, there is no maximum likelihood estimator.

To address the completely separable situation, we can modify the negative log-likelihood by adding a ridge penalty, namely we minimize

fλ (β) = f(β) + |β|2(2) ,

where λ > 0 is a tuning parameter. By choosing large λ , we penalize solutions that have large 2-norms.

6. Write the gradient and Hessian of fλ (β).

7. What is the computational complexity for a calculating the gradient and Hessian of fλ (β)?

8. Prove that fλ (β) has a unique global minimizer for all λ > 0.

Part 2. Gradient Descent

You will next add an implementation of gradient descent to your R package. Your function will include using both a ﬁxed step-size as well as one chosen by backtracking.

Please complete the following steps.

Step 0: Make a ﬁle called homework3.R in your R package. Put it in the R subdirectory of your package.

Step 1: Write a function “gradient_step.”

# ' Gr-c3ant stap

# '

# ' op-r-m fr-ce g-ncla to eunbt3on tg-t raturns fr-c3ant oe oajabt3va eunbt3on # ' op-r-m g burrant p-r-matar ast3m-ta

# ' op-r-m t stap-s3za

# ' oagport

gradient_step <- function(gradf, t, x) {

}

Your function should return x+ = x - t5f(x).

Step 2: Write a function “gradient_descent_ﬁxed.”

# ' Gr-c3ant Dasbant (F3gac stap-s3za)

# '

# ' op-r-m eg g-ncla to eunbt3on tg-t raturns oajabt3va eunbt3on v-luas

# ' op-r-m fr-ce g-ncla to eunbt3on tg-t raturns fr-c3ant oe oajabt3va eunbt3on # ' op-r-m go 3n3t3-l p-r-matar ast3m-ta

# ' op-r-m t stap-s3za

# ' op-r-m m-g 3tar m-g3mum numaar oe 3tar-t3ons

# ' op-r-m tol bonvarfanba tolar-nba

# ' oagport

gradient_descent_fixed <- function(fx, gradf, t, x0, max_iter=1e2 , tol=1e-3) {

}

Your function should return

● The ﬁnal iterate value

● The objective function values

● The 2-norm of the gradient values

● The relative change in the function values

● The relative change in the iterate values

Step 3: Write a function “backtrack.”

# ' B-bktr-bk3nf

# '

# ' op-r-m eg g-ncla to eunbt3on tg-t raturns oajabt3va eunbt3on v-luas

# ' op-r-m g burrant p-r-matar ast3m-ta

# ' op-r-m t burrant stap-s3za

# ' op-r-m ce tga v-lua oe tga fr-c3ant oe oajabt3va eunbt3on av-lu-tac -t tga burrant g # ' op-r-m -lpg- tga a-bktr-bk3nf p-r-matar

# ' op-r-m aat- tga cabramant3nf mult3pl3ar

# ' oagport

backtrack <- function(fx, t, x, df, alpha=0.5 , beta=0.9) {

}

Your function should return the selected step-size.

Step 4: Write a function “gradient_descent_backtrack” that performs gradient descent using backtracking.

# ' Gr-c3ant Dasbant (B-bktr-bk3nf stap-s3za)

# '

# ' op-r-m eg g-ncla to eunbt3on tg-t raturns oajabt3va eunbt3on v-luas

# ' op-r-m fr-ce g-ncla to eunbt3on tg-t raturns fr-c3ant oe oajabt3va eunbt3on # ' op-r-m go 3n3t3-l p-r-matar ast3m-ta

# ' op-r-m m-g 3tar m-g3mum numaar oe 3tar-t3ons

# ' op-r-m tol bonvarfanba tolar-nba

# ' oagport

gradient_descent_backtrack <- function(fx, gradf, x0, max_iter=1e2 , tol=1e-3) {

}

Your function should return

● The ﬁnal iterate value

● The objective function values

● The 2-norm of the gradient values

● The relative change in the function values

● The relative change in the iterate values

Step 5: Write a function “gradient_descent” that is a wrapper function for “gradient_descent_ﬁxed” and “gradient_descent_backtrack.” The default should be to use the backtracking.

# ' Gr-c3ant Dasbant

# '

# ' op-r-m eg g-ncla to eunbt3on tg-t raturns oajabt3va eunbt3on v-luas

# ' op-r-m fr-ce g-ncla to eunbt3on tg-t raturns fr-c3ant oe oajabt3va eunbt3on # ' op-r-m go 3n3t3-l p-r-matar ast3m-ta

# ' op-r-m t stap-s3za

# ' op-r-m m-g 3tar m-g3mum numaar oe 3tar-t3ons

# ' op-r-m tol bonvarfanba tolar-nba

# ' oagport

gradient_descent <- function(fx, gradf, x0, t=NULL , max_iter=1e2 , tol=1e-3) {

}

Your function should return

● The ﬁnal iterate value

● The objective function values

● The 2-norm of the gradient values

● The relative change in the function values

● The relative change in the iterate values

Step 6: Write functions ‘fx_logistic’ and ‘gradf_logistic’ to perform ridge logistic regression

# ' aajabt3va Funbt3on eor Lof3st3b Rafrass3on

# '

# ' op-r-m y a3n-ry rasponsa

# ' op-r-m x cas3fn m-tr3g

# ' op-r-m aat- rafrass3on boaee3b3ant vabtor

# ' op-r-m l-mac- raful-r3z-t3on p-r-matar

# ' oagport

fx_logistic <- function(y, X, beta, lambda=0) {

}

# ' Gr-c3ant eor Lof3st3b Rafass3on

# '

# ' op-r-m y a3n-ry rasponsa

# ' op-r-m x cas3fn m-tr3g

# ' op-r-m aat- rafrass3on boaee3b3ant vabtor

# ' op-r-m l-mac- raful-r3z-t3on p-r-matar

# ' oagport

gradf_logistic <- function(y, X, beta, lambda=0) {

}

Step 7: Perform logistic regression (with λ = 0) on the following data example (y, X) using the ﬁxed step-size. Use your answers to Part 1 to choose an appropriate ﬁxed step-size. Plot the diﬀerence f(βk ) - f(β10000 ) versus the iteration k. Comment on the shape of the plot given what you know about the iteration complexity of gradient descent with a ﬁxed step size.

set.seed(12345)

n <- 100

p <- 2

X <- matrix(rnorm (n*p),n,p)

beta0 <- matrix(rnorm (p),p,1)

y <- (runif(n) <= plogis (X%*%beta0)) + 0

Step 8: Perform logistic regression (with λ = 0) on the simulated data above using backtacking. Plot the diﬀerence f(βk ) - f(β10000 ) versus the iteration k. Comment on the shape of the plot given what you know about the iteration complexity of gradient descent with backtracking.

Step 9: Perform logistic regression (with λ = 10) on the simulated data above using the ﬁxed step-size. Plot the diﬀerence fλ (βk ) - fλ (β10000 ) versus the iteration k. Comment on the shape of the plot given what you know about the iteration complexity of gradient descent applied to strongly convex functions.