Homework 3 2022
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Homework 3
2022
Part 1. We will work through some details on logistic regression. We first set some notation. Let
xi e Rp , yi e {0, 1} for i = 1, . . . , n. Let X e Rn ×p denote the matrix with xi as its ith row. Recall that the negative log-likelihood f(β) can be written as follows:
n
f(β) = ┌ -yixi(T)β + log(1 + exp(xi(T)β))┐ .
i=1
1. Write the gradient and Hessian of f(β).
2. What is the computational complexity for a calculating the gradient and Hessian of f(β)?
3. Under what condition is f(β) strictly convex?
4. Prove that f(β) is L-Lipschitz differentiable with L = |X|o(2)p .
5. Suppose that there is a vector w e Rp such that xi(T)w > 0 if yi = 1 and xi(T)w < 0 if yi = 0. Prove that
f(β) does not have a global minimum. In other words, when the classification problem is completely
separable, there is no maximum likelihood estimator.
To address the completely separable situation, we can modify the negative log-likelihood by adding a ridge penalty, namely we minimize
fλ (β) = f(β) + |β|2(2) ,
where λ > 0 is a tuning parameter. By choosing large λ , we penalize solutions that have large 2-norms.
6. Write the gradient and Hessian of fλ (β).
7. What is the computational complexity for a calculating the gradient and Hessian of fλ (β)?
8. Prove that fλ (β) has a unique global minimizer for all λ > 0.
Part 2. Gradient Descent
You will next add an implementation of gradient descent to your R package. Your function will include using both a fixed step-size as well as one chosen by backtracking.
Please complete the following steps.
Step 0: Make a file called homework3.R in your R package. Put it in the R subdirectory of your package.
Step 1: Write a function “gradient_step.”
# ' Gr-c3ant stap
# '
# ' op-r-m fr-ce g-ncla to eunbt3on tg-t raturns fr-c3ant oe oajabt3va eunbt3on # ' op-r-m g burrant p-r-matar ast3m-ta
# ' op-r-m t stap-s3za
# ' oagport
gradient_step <- function(gradf, t, x) {
}
Your function should return x+ = x - t5f(x).
Step 2: Write a function “gradient_descent_fixed.”
# ' Gr-c3ant Dasbant (F3gac stap-s3za)
# '
# ' op-r-m eg g-ncla to eunbt3on tg-t raturns oajabt3va eunbt3on v-luas
# ' op-r-m fr-ce g-ncla to eunbt3on tg-t raturns fr-c3ant oe oajabt3va eunbt3on # ' op-r-m go 3n3t3-l p-r-matar ast3m-ta
# ' op-r-m t stap-s3za
# ' op-r-m m-g 3tar m-g3mum numaar oe 3tar-t3ons
# ' op-r-m tol bonvarfanba tolar-nba
# ' oagport
gradient_descent_fixed <- function(fx, gradf, t, x0, max_iter=1e2 , tol=1e-3) {
}
Your function should return
● The final iterate value
● The objective function values
● The 2-norm of the gradient values
● The relative change in the function values
● The relative change in the iterate values
Step 3: Write a function “backtrack.”
# ' B-bktr-bk3nf
# '
# ' op-r-m eg g-ncla to eunbt3on tg-t raturns oajabt3va eunbt3on v-luas
# ' op-r-m g burrant p-r-matar ast3m-ta
# ' op-r-m t burrant stap-s3za
# ' op-r-m ce tga v-lua oe tga fr-c3ant oe oajabt3va eunbt3on av-lu-tac -t tga burrant g # ' op-r-m -lpg- tga a-bktr-bk3nf p-r-matar
# ' op-r-m aat- tga cabramant3nf mult3pl3ar
# ' oagport
backtrack <- function(fx, t, x, df, alpha=0.5 , beta=0.9) {
}
Your function should return the selected step-size.
Step 4: Write a function “gradient_descent_backtrack” that performs gradient descent using backtracking.
# ' Gr-c3ant Dasbant (B-bktr-bk3nf stap-s3za)
# '
# ' op-r-m eg g-ncla to eunbt3on tg-t raturns oajabt3va eunbt3on v-luas
# ' op-r-m fr-ce g-ncla to eunbt3on tg-t raturns fr-c3ant oe oajabt3va eunbt3on # ' op-r-m go 3n3t3-l p-r-matar ast3m-ta
# ' op-r-m m-g 3tar m-g3mum numaar oe 3tar-t3ons
# ' op-r-m tol bonvarfanba tolar-nba
# ' oagport
gradient_descent_backtrack <- function(fx, gradf, x0, max_iter=1e2 , tol=1e-3) {
}
Your function should return
● The final iterate value
● The objective function values
● The 2-norm of the gradient values
● The relative change in the function values
● The relative change in the iterate values
Step 5: Write a function “gradient_descent” that is a wrapper function for “gradient_descent_fixed” and “gradient_descent_backtrack.” The default should be to use the backtracking.
# ' Gr-c3ant Dasbant
# '
# ' op-r-m eg g-ncla to eunbt3on tg-t raturns oajabt3va eunbt3on v-luas
# ' op-r-m fr-ce g-ncla to eunbt3on tg-t raturns fr-c3ant oe oajabt3va eunbt3on # ' op-r-m go 3n3t3-l p-r-matar ast3m-ta
# ' op-r-m t stap-s3za
# ' op-r-m m-g 3tar m-g3mum numaar oe 3tar-t3ons
# ' op-r-m tol bonvarfanba tolar-nba
# ' oagport
gradient_descent <- function(fx, gradf, x0, t=NULL , max_iter=1e2 , tol=1e-3) {
}
Your function should return
● The final iterate value
● The objective function values
● The 2-norm of the gradient values
● The relative change in the function values
● The relative change in the iterate values
Step 6: Write functions ‘fx_logistic’ and ‘gradf_logistic’ to perform ridge logistic regression
# ' aajabt3va Funbt3on eor Lof3st3b Rafrass3on
# '
# ' op-r-m y a3n-ry rasponsa
# ' op-r-m x cas3fn m-tr3g
# ' op-r-m aat- rafrass3on boaee3b3ant vabtor
# ' op-r-m l-mac- raful-r3z-t3on p-r-matar
# ' oagport
fx_logistic <- function(y, X, beta, lambda=0) {
}
# ' Gr-c3ant eor Lof3st3b Rafass3on
# '
# ' op-r-m y a3n-ry rasponsa
# ' op-r-m x cas3fn m-tr3g
# ' op-r-m aat- rafrass3on boaee3b3ant vabtor
# ' op-r-m l-mac- raful-r3z-t3on p-r-matar
# ' oagport
gradf_logistic <- function(y, X, beta, lambda=0) {
}
Step 7: Perform logistic regression (with λ = 0) on the following data example (y, X) using the fixed step-size. Use your answers to Part 1 to choose an appropriate fixed step-size. Plot the difference f(βk ) - f(β10000 ) versus the iteration k. Comment on the shape of the plot given what you know about the iteration complexity of gradient descent with a fixed step size.
set.seed(12345)
n <- 100
p <- 2
X <- matrix(rnorm (n*p),n,p)
beta0 <- matrix(rnorm (p),p,1)
y <- (runif(n) <= plogis (X%*%beta0)) + 0
Step 8: Perform logistic regression (with λ = 0) on the simulated data above using backtacking. Plot the difference f(βk ) - f(β10000 ) versus the iteration k. Comment on the shape of the plot given what you know about the iteration complexity of gradient descent with backtracking.
Step 9: Perform logistic regression (with λ = 10) on the simulated data above using the fixed step-size. Plot the difference fλ (βk ) - fλ (β10000 ) versus the iteration k. Comment on the shape of the plot given what you know about the iteration complexity of gradient descent applied to strongly convex functions.
2022-02-26