Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Introduction to statistical Learning

STA314H5F, Fall 2023

Assignment 1

Due:september 27 at 11:59 PM

Written component

Question 1 (20 points).

Let A represent an N x N matrix such that AlA = A2 .

(a) show that tr l (A Al)l (A Al)] = O.

(b) show that A is symmetric.

(c) Determine whether A is positive semideinite.

(d) Is it possible to conclude that any symmetric matrix is positive semideinite? If not, what are the suficient conditions on symmetric matrices that make them positive semideinite?

Question 2 (15 points).

Let A represent an N  N idempotent matrix; i.e., A2  = A.

(a) show that tr(A) = rank(A).

(b) Any idempotent matrix can be used to deine a projection operator. show that the square matrix P x(xx)-1 xl is a projection.

(c) Any idempotent matrix is said to be an orthogonal projection if it is further symmetric. show that the square matrix P  x(xx)-1 xl , also known as the hat matrix, is an orthogonal projection.

Question 3 (10 points).

consider the following simple linear regression model

y = ε

where

l  6               l

'   8   '               '

'        '               '

'   9   '                '

y =  '        '   x =  '

'   7   '               '

'        '               '

'        '               '

[ 1O l              [

2

5O 

52  '

'

55  '

'

75  '

'

57  '

58 l

Here, ε is the 6  1 vector of errors, and β is the 2  1 vector of parameters. As covered during lectures, the estimate β(教) of parameter β is given by

β(教) = (xx)-1 xly.

Further, we discussed that estimates of the response variablesy(教) can be obtained by

y(教) xβ(教) x (xx)-1 xly.

(a) present a step-by-step numerical evaluation of the following hat matrix

P x (xx)-1 xl .

(b) knowing that the element (i)j) of P measures the inluence of the jth observation on the ith predicted value, discuss what effects do the diagonal elements of P capture.

(c) Does your computed value for tr(P) reveal any information about the structure of the under- lying regression model?

Question 4 (15 points).

compute the derivatives a(a)  of the following functions by using chain rule. Describe your steps in

detail.

(a)

f (z) = exp (- z)

z = g(y) = yS-1y

y = h(w) = w μ

where wμ E Rp , S E Rp p.

(b)

f (w) = trww + σ2 Ip)

where w E Rp.

(c)

f (w) = tanh(Aw + b)

where w  E  Rp , A  E  RN p , b  E  RN,  and  tanh  is  applied to  every component of its P- dimensional input vector.

programming component

Question 5 (40 points).

our objective is to construct polynomial regression models to address the given problem. specif- ically, the polynomial function is formulated as a weighted sum of monomials up to a maximum degree K:

g = f () = β0  + βkk

Your task is to estimate the parameter vector βk  =  [β0 , β1 , . . . , βk ] for each K ranging from 1 to 10. subsequently, identify the optimal model by selecting the most suitable value for K that yields accurate predictions on unseen (test) data.

Data structure

The datasets are organized in separate tabs within an Excel ile for cross-platform accessibility. Each sheet contains N-dimensional column vectors, representing theN training inputs and outputs for the regression task. Three distinct, color-coded datasets are available, each generated from a polynomial function with a scalar input ① and added noise. These datasets are further divided into small and large training sets, as well as a test set, enabling you to examine the impact of dataset size on model itting.

Implementation and Analysis

once you,ve conidently implemented the polynomial regressor, a topic extensively covered in your tutorials, you can proceed to investigate over-itting, under-itting, and model generalization. These are common pitfalls that can compromise the model,s real-world performance. You,ll ex- plore the contributing factors and potential mitigation for these issues.

Reporting

To assess the inluence of dataset size and model complexity, you,ll prepare a succinct yet thorough report. This should include an analysis of how training and test errors are affected by the size of the training and test datasets, as well as by the polynomial degree. You may use R Markdown or Jupyter Notebook for report preparation. while there,s no length requirement, aim for a report that is both brief and informative. utilize igures and schematics to present key indings, relegating algorithmic details and less critical results to an appendix. Avoid cluttering your report with large data arrays that don,t offer meaningful insights. Your inal report should be submitted as a pdfile to crowdmark.