Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STA410 Week 4 Programming Assignment (10 points)

Welcome.

Rules

0. This is a paired or individual assignment. Speciic code solutions submitted for these assignments must be created either individually or in the context of a paired effort: group efforts of three or more are students are not allowed. Please seek homework partners in-person or on the course discussion board on piazza. Paired students each separately submit their (common) work, including (agreeing) contribution of work statements for each problem.

Students choosing to work individually must work in accordance with the University Student Academic Integrity values of "honesty, trust, fairness, respect, responsibility and courage." Students working in pairs may share work without restriction within their pair, but must otherwise work in accordance with the University Student Academic Integrity values noted         above. Getting and sharing "hints" from other classmates is allowed; but, the eventual code creation work and submission must be your own individual or paired creation.

1. Do not delete or replace cells: this erases cell ids upon which automated code tests are based.

If you accidentally delete a required cell try "Edit > Undo Delete Cells" in the notebook editor; otherwise, redownload the notebook

(so it has the correct required cells ids) and repopulate it with your answers (assuming you don't overwrite them when you redownload the notebook).

You may add cells for scratch work but if required answers are not submitted through the provided cells where the answers are

requested your answers may not be marked.

You may check if cell ids are present and working by running the following command in a cell

! grep '"id": ' .ipynb

and making sure the cell ids do not change when you save your notebook.

If you are working in any environment other than UofT JupyterLab, UofT JupyterHub, or Google Colab, your

system must meet the following versioning requirements

notebook format >=4.5

jupyter notebook version >=6.2 for "classic" notebooks served by jupyterhub

jupyterlab version >=3.0.13 for "jupyterlab" notebooks

otherwise cell ids will not be supported and you will not get any credit for your submitted homework.

2. No cells may have any runtime errors because this causes subsequent automated code tests to fail and you will not get marks for tests which fail because of previous runtime errors.

Run time errors include, e.g., unassigned variables, mismatched parentheses, and any code which does not work when the notebook

cells are sequentially run, even if it was provided for you as part of the starter code. It is best to restart and re-run the cells in your notebook to ensure there are no runtime errors before submitting your work.

The try- except block syntax catches runtime errors and transforms them into exceptions which will not cause subsequent automated code tests to fail.

3. No jupyter shortcut commands such as  ! python script.py 10 or %%timeit may be included in the inal submission as they will cause subsequent automated code tests to fail.

Comment out ALL jupyter shortcut commands, e.g., # ! python script.py 10 or # %%timeit in submitted notebooks.

4. Python library imports are limited to only libraries imported in the starter code and the standard python modules. Importing additional libraries will cause subsequent automated code tests to fail.

Unless a problem instructs differently, you may use any functions available from the libraries imported in the starter code; otherwise, you are expected to create your own Python functionality based on the Python stdlib (standard libary, i.e., base Python and standard Python modules).

5. You are welcome and encouraged to adapt code you ind available online into your notebook; however, if you do so you must provide a link to the utilized resource. If failure to cite such references is identiied and conirmed, your mark will be immediately reduced to 0.

# Unless a problem instructs differently, you may use any functions available from the following library imports

import numpy as np

from scipy import stats

import matplotlib.pyplot as plot

from numpy.linalg import inv, solve, cholesky, svd, qr

from scipy.linalg import solve_triangular

import tensorflow as tf

Problem 0 (required)

Are you working with a partner to complete this assignment?

If not, assign the value of None into the variable Partner .

If so, assign the name of the person you worked with into the variable Partner .

Format the name as  " " as a str type, e.g., "Scott Schwartz".

# Required: only worth points when not completed, in which case, you'll lose points

Partner = #None

# This cell will produce a runtime error until you assign a value to this variable

What was your contribution in completing the code for this assignments problems? Assign one of the following into each of the Problem_X variables below.

"I worked alone"

"I contributed more than my partner"

"My partner and I contributed equally"

"I contributed less than my partner"

"I did not contribute"

# Required: only worth points when not completed, in which case, you'll lose points

Problem_1 = #"I worked alone"

Problem_2 = #"I worked alone"

# This cell will produce a runtime error until you assign a value to this variable

Problem 1 (5 points)

This problem considers alternative computations and topics related to condition for the least squares model it for the y = n×pp + e model.


Problem 1 Question 0 (0.25 points)

0. What is the condition of T when the following code is run?


np.random.seed(10); n,p = 100,10; noise = n/2

X = np.ones((n,p))*np.arange(n)[:,np.newaxis] + stats.norm.rvs(size=(n,p))*noise

X = (X - X.mean(axis=0)[np.newaxis,:])/X.std(axis=0)[np.newaxis,:]

X[:,0] = 1

Hint:

There is a deinition of condition, but there's also a numpy that would be easiest here.

Indirectly relevant to this question are plt.plot(X[:,j],X[:,k], '. ') and np.round(np.corrcoef(X.T),2)

# 0.25 points [format: just a plain number of type

p1q0 = # condition of the gramian of X

# This cell will produce a runtime error until the

0.9999999999999994

`float` or `np.float`]

`p2q0` variable is assigned

# Cell for scratch work

# You are welcome to add as many new cells into this notebook as you would like.

# Just do not leave in a state that will produce a runtime errors when notebook cells are run sequentially.

# Any cells included for scratch work that are no longer needed may be deleted so long as

# - all the required functions are still defined and available when called

# - no cells requiring variable assignments are deleted.

# None of this will not cause problems with `cell ids` assuming your versioning supports `cell ids`

# (as UofT JupyterHub, UofT JupyterLab, an Google Colab will).

# Cell for scratch work

Problem 1 Question 1 (0.25 points)

1. What is the condition of T when the following code is run?

np.random.seed(10); n,p = 100,10; noise = n/5

X = np.ones((n,p))*np.arange(n)[:,np.newaxis] + stats.norm.rvs(size=(n,p))*noise

X = (X - X.mean(axis=0)[np.newaxis,:])/X.std(axis=0)[np.newaxis,:]

X[:,0] = 1

There is a deinition of condition, but there's also a numpy that would be easiest here.

Indirectly relevant to this question are plt.plot(X[:,j],X[:,k], '. ') and np.round(np.corrcoef(X.T),2)

# 0.25 points [format: just a plain number of type `float` or `np.float`]

p1q1 = # condition of the gramian of X

# This cell will produce a runtime error until the `p1q1` variable is assigned

# Cell for scratch work

# You are welcome to add as many new cells into this notebook as you would like.

# Just do not leave in a state that will produce a runtime errors when notebook cells are run sequentially.

# Any cells included for scratch work that are no longer needed may be deleted so long as

# - all the required functions are still defined and available when called

# - no cells requiring variable assignments are deleted.

# None of this will not cause problems with `cell ids` assuming your versioning supports `cell ids`

# (as UofT JupyterHub, UofT JupyterLab, an Google Colab will).

# Cell for scratch work

Problem 1 Question 2-6 (1.5 points)

Using the imported inv, solve, solve_triangular, cholesky, qr, and svd functions, answer the following questions for X and y generated from the following code.

np.random.seed(10); n,p = 100,10; noise,sigma = n/20,1

X = np.ones((n,p))*np.arange(n)[:,np.newaxis] + stats.norm.rvs(size=(n,p))*noise

X = (X - X.mean(axis=0)[np.newaxis,:])/X.std(axis=0)[np.newaxis,:]

X[:,0] = 1; y = [email protected]((p,1)) + stats.norm.rvs(size=(n,1))*sigma

2. (0.25 points) What is the least squares estimate for in = based on inverting T ?

3. (0.25 points) What is the least squares estimate for in = based on solving for in the normal equations?

4. (0.25 points) What is the least squares estimate for in = on the basis of the cholesky decomposition of T and solving two rounds of systems of linear equations?

5. (0.25 points) What is the least squares estimate for in = on the basis of the QR decomposition of ?

6. (0.5 points) What is the least squares estimate for in = on the basis of the SVD decomposition of ?

When computing your answers, the following restrictions apply:

The order of operations impacts the inal numerical results; so, you must use parentheses to ensure the order of operations corresponds exactly to the natural order of steps that would follow we constructing with any of the above strategies.

You may not use inv(np.diag(D)); so, you must instead use the construct np.arange(10)[np.newaxis,:]*np.ones((100,10)) for

DM a diagonal matrix D times a rectangular matrix M.

You are not to use np.linalg.lstsq(X,y,rcond=None)[0] in place of the extracting U,d,Vt from the svd function and then computing the necessary matrix operations.

Hints:

The following code can show the differences between the different computations of the least squares estimate.

betahats_ascols = np.array([[betahat_inv, betahat_solve, betahat_cholesky, betahat_qr, betahat_svd]])

betahats_asrows = np.array([[betahat_inv],

[betahat_solve],

[betahat_cholesky],

[betahat_qr],

[betahat_svd]])

np.set_printoptions(linewidth=120)

# Absolute differences in betahat estimates

np.abs(betahats_ascols-betahats_asrows).sum(axis=2).sum(axis=2) # numpy broadcasting

# Rounded beta hat estimates estimates

np.round(np.c_[betahat_inv, betahat_solve, betahat_cholesky, betahat_qr, betahat_svd, betahat_svd2],5)

Of relevance for the problem at hand is np.round(np.corrcoef(X.T),2) and the condition number of X .

# Cell for scratch work

# You are welcome to add as many new cells into this notebook as you would like.

# Just do not leave in a state that will produce a runtime errors when notebook cells are run sequentially.

# Any cells included for scratch work that are no longer needed may be deleted so long as

# - all the required functions are still defined and available when called

# - no cells requiring variable assignments are deleted.

# None of this will not cause problems with `cell ids` assuming your versioning supports `cell ids`

# (as UofT JupyterHub, UofT JupyterLab, an Google Colab will).

# Cell for scratch work

# 0.25 points each [format: outputs of the appropriate functions calls and calculations]

p1q2 = #inv(...

p1q3 = #solve(...

#L = cholesky(...

p1q4 = #solve_triangular(...

#Q,R = qr(

p1q5 = #solve_triangular(...

# Correct L and Q and R are needed for appropriate use of `solve_triangular`

# This cell will produce a runtime error until the variables `p1q2` through `p1q5` are assigned values

# 0.5 point [format: outputs of the appropriate calculations]

#U,D,Vt = svd(...

p1q6 = #...

# Correct U and D and Vt are needed for the subsequent calculations

# This cell will produce a runtime error until the variable `p1q6` is assigned a value

Problem 1 Questions 7-8 (1 point)

7. (0.5 points) Which of the following best describes the numerical difference between the methods above?

1. Numeric differences are meaningfully different for the qr and svd approaches compared to the methods based on X.T@X

2. Numeric differences are not meaningfully different for any of the methods relative to the multicollinearity/condition of X

3. Numeric differences are meaningfully different, with a worst to best numerical precision ranking of inv, solve, cholesky, qr, svd

4. Numeric differences are not meaningfully different because the condition number and variance inlation factors of X are not very large

8. (0.5 points) What aboout this X makes it a poor choice for regression?

1. The correlations of the continuous predictors are all all too high $>0.95$

2. The correlations of the continuous predictors are all between $0.5$ and $0.95$

3. The correlations of the continuous predictors are all between $0.25$ and $0.5$

4. It's not actually a problem with the correlations of the continuous predictors

Hint: the hints from questions 2-6 above would be helpful for answering these questions

# 1 point (0.5 points each) [format: `str` either "A" or "B" or "C" or "D" based on the choices above]

p1q7 = #<"A" |"B" |"C" |"D">

p1q8 = #<"A" |"B" |"C" |"D">

# Uncomment the above and keep only either "A" or "B" or "C" or "D"

# This cell will produce a runtime error until the variables `p1q7` and `p1q8` are assigned values

# Cell for scratch work

# You are welcome to add as many new cells into this notebook as you would like.

# Just do not leave in a state that will produce a runtime errors when notebook cells are run sequentially.

# Any cells included for scratch work that are no longer needed may be deleted so long as

# - all the required functions are still defined and available when called

# - no cells requiring variable assignments are deleted.

# None of this will not cause problems with `cell ids` assuming your versioning supports `cell ids`

# (as UofT JupyterHub, UofT JupyterLab, an Google Colab will).

# Cell for scratch work

Problem 1 Questions 9-10 (1 point)

Solve for in = with the same methods in questions 2-6 above, again using the imported inv, solve, solve_triangular, cholesky, qr, and svd functions, but this time for X and y generated from the following code.

np.random.seed(10); n,p = 100,10; noise,sigma = .0000001,1

X = np.ones((n,p)) + stats.norm.rvs(size=(n,p))*noise

X[:,0] = 1; y = np.ones((n,1))

9. (0.5 points) Which of the following best describes the numerical difference between the methods above?

1. Numeric differences are meaningfully different for the qr and svd approaches compared to the methods based on X.T@X

2. Numeric differences are not meaningfully different for any of the methods compared to the multicollinearity/condition of X

3. Numeric differences are meaningfully different, with a worst to best numerical precision ranking of inv, solve, cholesky, qr, svd

4. Numeric differences are not meaningfully different because the condition number and variance inlation factors of X are not very large

10. (0.5 points) What's different about X from questions 9 and 10 compared to X from questions 2-8?

1. Their dimensions and use of intercept columns

2. They both have very high correlations but very different magnitude condition numbers

3. They have different correlation magnitudes but similar magnitude condition numbers

4. The condition number of the former is much larger than the latter despite having seemingly unconcerning correlations

Hint: the hints from questions 2-6 above would be helpfuling for answer these questions

# 1 point (0.5 points each) [format: `str` either "A" or "B" or "C" or "D" based on the choices above]

p1q9 = #<"A" |"B" |"C" |"D">

p1q10 = #<"A" |"B" |"C" |"D">

# Uncomment the above and keep only either "A" or "B" or "C" or "D"

# This cell will produce a runtime error until the variables `p1q9` and `p1q10` are assigned values

# Cell for scratch work

# You are welcome to add as many new cells into this notebook as you would like.

# Just do not leave in a state that will produce a runtime errors when notebook cells are run sequentially.

# Any cells included for scratch work that are no longer needed may be deleted so long as

# - all the required functions are still defined and available when called

# - no cells requiring variable assignments are deleted.

# None of this will not cause problems with `cell ids` assuming your versioning supports `cell ids`

# (as UofT JupyterHub, UofT JupyterLab, an Google Colab will).

# Cell for scratch work

Problem 1 Questions 11-14 (1 point)

11. (0.25 points) What objective (optimization) function is question 2 concerned with?

1. The L2 norm of the residuals ||(y )T (y )||2

2. A quadratic function pT T − yT minimized at

3. A and B (which have the same optimization solutions)

4. Solving = y

12. (0.25 points) Assuming a full rank design matrix , which of the following is a true statement?

1. Least squares is a nonlinear objective (optimization) function

2. The solution to the least squares problem is solved by a system of linear equations

3. A and B (which are both true)

4. The O(np2 ) and O(p3 ) computations of this problem are preferable to O(sp2 ) iterative methods which require the O(np2 ) grammian computation

13. (0.25 points) Which of the following is a true statement relative to iterative methods based on the grammian?

1. Using the inv, solve, and cholesky methods also requires computing the grammian

2. Using the qr and svd methods also requires computing the grammian

3. The inv, solve, and cholesky methods themselves are faster since they are O(np2 )

4. The qr and svd methods themselves are faster since they are O(p3 )

14. (0.25 points) To avoid the grammian computation, the general nonsquare Gauss-Seidel (coordinate descent) strategy

# s < p is chosen

for(j in range(s)):

for(i in range(n)):

betahat[i%p] = q(X[i,:], y[i], betahat[:(i%p)], betahat[(1+(i%p)):])

could be used. What is the Big-O computational complexity of the univariate q update function that would result in a Big-O computational complexity for this algorithm that is the same as regular Gauss-Seidel (including the grammian computation, which it requires)?

1. O(p2/s)

2. O(p)

3. O(p3/(ns))

4. O(p2/n)

# 1 point (0.25 points each) [format: `str` either "A" or "B" or "C" or "D" based on the choices above]

p1q11 = #<"A" |"B" |"C" |"D">

p1q12 = #<"A" |"B" |"C" |"D">

p1q13 = #<"A" |"B" |"C" |"D">

p1q14 = #<"A" |"B" |"C" |"D">

# Uncomment the above and keep only either "A" or "B" or "C" or "D"

# This cell will produce a runtime error until the variables `p2q11` and `p2q12` are assigned values

Problem 2 (5 points)

Complete the gradient descent optimization algorithm below for

parameters A1, b1 , A2, and b2 of the model

= fA1 ,b1 ,A2 ,b2 (x) = ((A2(A1x + b1 )+ + b2 )+)

minimizing the (loss function objective) function ||e||2 = ||y − fA1 ,b1 ,A2 ,b2 (x)||2

where (⋅)+ is the so-called ReLU activation function which sets all negative values within the object to 0.

import tensorflow as tf

np.random.seed(3)

alpha,K = 0.01,10

d,q1,q2 = 3,2,3

x = tf.constant(np.random.normal(size=(d,1)))

y = tf.constant(np.random.normal(size=(d,1)))

A1 = tf.Variable(np.random.normal(size=(q1,d)))

b1 = tf.Variable(np.random.normal(size=(q1,1)))

A2 = tf.Variable(np.random.normal(size=(q2,q1)))

b2 = tf.Variable(np.random.normal(size=(q2,1)))

arg1 = tf.TensorSpec(shape=x.shape, dtype=tf.float64)

arg2 = tf.TensorSpec(shape=A1.shape, dtype=tf.float64)

arg3 = tf.TensorSpec(shape=b1.shape, dtype=tf.float64)

arg4 = tf.TensorSpec(shape=A2.shape, dtype=tf.float64)

arg5 = tf.TensorSpec(shape=b2.shape, dtype=tf.float64)

@tf.function(input_signature=(arg1,arg2,arg3,arg4,arg5,))

def f(x0, A1, b1, A2, b2):

return None # <- fix this

@tf.function(input_signature=(arg1,arg1,))

def loss(y, yhat):

return None # <- fix this

# You are not meant to actually fit a model here,

# only set up and run the correct code that could fit a model

# and then run it for `K` steps

# and then assign `p2q0 = loss(x, y, A1, b1, A2, b2)`

# the final value of the `loss` function after the `K` steps finish

for i in range(K):

#

pass

# 2 points [format: `(A1, b1, A2, b2)` with the results of the for loop above after it

#                       implements the gradient descent algorithm for `alpha,K = 0.01,10` ]

p2q2 = #(A1, b1, A2, b2) # the `p2q2` variable name is correct -- it makes sense to assign this here

# Uncomment the line above so the result of the gradient descent algorithm for `alpha,K = 0.01,10` is saved

# This cell will produce a runtime error until `p2q2` variable is assigned a value

Problem 2 question 0 (1 point)

Your f function above will be tested for various inputs.

Problem 2 question 1 (1 point)

Your loss function above will be tested for various inputs.

Problem 2 question 2 (2 points)

Your gradient descent algorithm above will be tested for various input.

Problem 2 question 3-4 (0.5 points)

3. (0.25 points) Why is the algorithm coded in this problem different than nonlinear Gauss-Seidel?

1. Because f is a linear function of x, not a nonlinear function of x

2. Because each parameter is not cyclically optimized given the value of the others