APMA E4990.002: HOMEWORK 2
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
APMA E4990.002: HOMEWORK 2
If you have questions about the homework feel free to contact me, or stop by my office hours. Try not to look up the answers, you’ll learn much more if you try to think about the problems without looking up the solutions. If you need hints, feel free to ask for them on Ed Discussion. You can work in groups but each student must write his/her own solution based on his/her own understanding of the problem. Unless stated otherwise, justify any answers you give. Sharing of code or solutions is prohibited.
If you need to impose extra conditions on a problem to make it easier (or consider specific cases of the question, like taking n to be 2, e.g.), state explicitly that you have done so. Solutions where extra conditions were assume, or where only special cases where treated, will also be graded (probably scored as a partial answer).
If you are using LATEX, consider using the minted or listings packages for typesetting relevant code you want to include in your PDF. For Jupyter notebooks, you can save them as LATEX or PDF before including them. Make sure your answers to each problem are clearly stated in the submitted PDF. The graders should not have to look through your code to find the solution. Any code (Jupyter Notebooks and other source files) used to compute your answers should also be uploaded to Gradescope, along with a readme file telling the graders how to run your code if they need to, and if you are using any special packages.
(1) Suppose you are given a dataset x1 , ..., xn e Rm as columns of a matrix X e Rm xn . Your goal is to reduce the dimensionality of the data using PCA.
(a) Suppose you want the resulting reduced vectors to be in Rk . Explain how to obtain this using PCA.
(b) How do you determine an appropriate k?
(c) How do you determine the amount of sample variance in the first principal direction?
(2) Let’s say we are given 6 data points |x, y} in 2 dimensions:
||1, |1}, |1, |1}, ||1, 0}, |1, 0}, ||1, 1}, and|1, 1}.
(a) Determine the first principal component of the data.
(b) Describe the result of linear regression, attempting to write y a an affine function of x. (This problem and the next one are intended to be solved by hand, but it is acceptable to submit a code output as a solution.)
(3) Repeat the preceding problem for the dataset:
||.5, |1}, |.5, |1}, ||.5, 0}, |.5, 0}, ||.5, 1}, |.5, 1}.
(4) Under what conditions, if any, will training error increase if you add a feature to your regression problem? How does the answer change if you are using ridge regression?
(5) Suppose you fit a linear regression model, but have scaled a feature by a factor of 10. (a) Under what conditions will this change the forecast A?
(b) What impact will this have on ridge regression?
(6) Suppose you are given data b ! Ax | z (all variables deterministic; x and z unknown) and compute the least squares estimator for x. Assuming |z| ! η is fixed, and A has full column rank, what direction for z produces the largest error |x | | , and how large is that error?
(7) Let our feature space be R2 and we have m data points that we organized into matrix A 扌 Rmx2 . Let Ac be an augmented feature matrix Rmx3 whose first column is '(口)c..c and the next two columns is the original matrix A. Suppose Ac is full rank, and let
!x(arg mi)n |Acx | b|2(2) | λ|x|2
Suppose for c ! 1, the resulting estimate is |1.99, 2.10, 1.06}. Now we introduce the new feature: fourth column of A that is a duplicate of the third column, and we fit the model again. Determine whether or not one of the following estimates is more likely to occur than the other.
(a) |1.98, 2.09, .20, .88}
(b) |1.98, 2.09, .54, .54}
(8) You are interested in evaluating the movie tastes of your friends, so you ask them to rate six movies from -10 to 10.
Bob Molly Mary Larry
╱ 、
Love Actually .(.) 8 10 |5 |9 .(.) :! A,
Bridget Jones’s Diary. 10 4 |6 |10 .
. .
The singular value decomposition of the matrix A is shown below (the numerical values are rounded to ease computations, so some of the columns/rows are not completely normalized; just ignore this).
口
A ! USVT ↓
.
|0.5
0.1
0.2
|0.6
0.5
0.2
|0.3
0.5
|0.6
0.6
0.1
|0.1
0...5722」ì 口.
0(0).(.)5(0) |ì ' 0 0
0(0) 0(0)」ì 口
5 0 ì .0.6
0 3| '0.6
0.5 |0.5
0.4 |0.4 |0.7 |0.4
0.2 0.7
」ì
0.2 ì
0.3 |
(a) Low-rank models are often used to predict movie ratings. Compute the rank-1 approxi- mation u1 σ 1v 1(T) where σ 1 is the largest singular value and u1 and v1 are the corresponding left and right singular vectors.
(b) Intuitively, low-rank models capture the fact that different groups of people like different types of movies. Give an interpretation of your rank-1 model in terms of the positive and negative entries of u1 and v1 . (Hint: The entries of the left singular vectors correspond to movies and the entries of the right singular vectors correspond to people.)
(c) You ask your friends about a recent movie: X-Men 8: Mutant Mosquito, but Molly has not seen it.
Bob Molly Mary Larry X-Men 7: Mutant Mosquito ||10 ? 8 10 }
What number can you put in place of ? above and obtain an estimate of Molly’s rating by projecting the resulting vector onto a suitable vector from the SVD of A. Do you predict that she will enjoy the movie?
(d) What simple preprocessing step do you suggest if you want to apply this analysis to ratings that are between 0 and 10 instead of between -10 and 10?
(9) In this exercise you will use the code in the findata folder of hw2.zip. For the data loading code to work properly, make sure you have the pandas Python package installed on your system. Throughout, we will be using the data obtained by calling load data in findata_tools.py. This will give you the names, and closing prices for a set of 18 stocks over a period of 433 days ordered chronologically. For a fixed stock (such as msft), let P1 , ..., P433 denote its sequence of closing prices ordered in time. For that stock, define the daily returns series Ri := Pi二1 | Pi for i = 1, ..., 432. Throughout we think of the daily stock returns as features, and each day (but the last) as a separate datapoint in R18 . That is, we have 432 datapoints each having 18 features.
(a) Looking at the first two principal directions of the centered data, give the two stocks with the largest coefficients (in absolute value) in each direction. Give a hypothesis why these two stocks have the largest coefficients, and confirm your hypothesis using the data. The file findata_tools .py has pretty print functions that can help you output your results. You are not required to include the principal directions in your submission.
(b) Standardize the centered data so that each stock (feature) has variance 1 and compute the first 2 principal directions. This is equivalent to computing the principal directions of the correlation matrix (the previous part used the covariance matrix). Using the information in the comments of generate findata .py as a guide to the stocks, give an English interpretation of the first 2 principal directions computed here. You are not required to include the principal directions in your submission.
(c) Assume the stock returns each day are drawn from a multivariate distribution x where xi corresponds to the ith stock. Assume further that you hold a portfolio with 200 shares of each of appl, amzn, msft, and goog, and 100 shares of each of the remaining
14 stocks in the dataset. Using the sample covariance matrix as an estimator for the true covariance of x, approximate the standard deviation of your 1 day portfolio returns y (this is a measure of the risk of your portfolio). Here y is given by
18
y :=↓ ai xi
i=1
where ai is the number of shares you hold of stock i.
(d) Assume further that x from the previous part has a multivariate Gaussian distribution. Compute the probability of losing 1000 or more dollars in a single day. That is, compute P |y ≤ |1000}.
Note: The assumptions made in the previous parts are often invalid and can lead to inac- curate risk calculations in real financial situations.
(10) In this exercise you will use the code in the heights_weights folder of hw3 .zip. We are interested in estimating the weight of people in a population just using their height. We would like to test two models. Model 1 is linear:
weight ! x height.
Model 2 also includes an intercept
weight ! y1 height | y0 .
(a) What is the least-squares estimate of x, y1 and y0 given two training data vectors h containing m heights and w containing the corresponding weights? (Hint: You can use
the formula for the inverse of a 2 | 2 matrix ┌c(a) d(b)/– 1 ! /).
(b) What is the point of adding an intercept? Sketch an example of a 2D data set where this could be a good idea.
(c) Complete the script heights_weights .py and report the relative errors achieved by the two models on the test dataset.
(d) Try out the models using fewer training points (for example 100). What do you observe? What does this suggest about linear models with few parameters in terms of the number of data? Explain whether you would favor them in settings where you have a lot of data or in settings where the data is scarce and why.
(11) (Extra credit) Let b | N |Ax, σ2 I}, an m-dimensional Gaussian where A 扌 Rm xn .
(a) Show that the least squares estimate | N |x, σ2 |AT A}– 1 ). (Thus, Eè| ! x, i.e., is unbiased.)
(b) Show that EèS2| ! σ 2 for
S2 ! |bi | T ai }2
where ai are the rows of A.
(c) Let M ! |AT A}– 1 . Show that if σ 2 is known
j | xj
^σ2 Mjj
(12) (Extra credit) Let |1, b1 }, ..., |50, b50 } be drawn from bi ! N |x0 | x1i, 32 } for i ! 1, ..., 50.
(a) Find r 扌 R such that P |0 扌 |x0 | r, x0 | r}} ! 0.95, where 0 is our least squares estimate of x0 .
(b) Find r 扌 R such that P |1 扌 |x1 | r, x1 | r}} ! 0.95, where 1 is our least squares estimate of x1 .
(c) Find r 扌 R such that P |0 | 31 扌 |x0 | 3x1 | r, x0 | 3x1 | r}} ! 0.95
(13) (Extra credit) Find a real dataset online or elsewhere and fit a ridge regression model. You can also use the hourly NOAA weather data in weather .zip and the data loading subroutines in weather_load.py from the weather folder of hw2.zip. Divide the data into a training and test set and plot the error on both for different values of the regularization parameter.
Acknowledgement: Problems in this homework set are based on materials in various courses taught by Dr. Brett Bernstein and Prof. Carlos Fernandez-Granda. These materials are used with the permission of the authors.
2022-10-17