Math 156, Summer 2022 Homework Assignment 1
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Homework Assignment 1
Math 156, Summer 2022
Problem 1 (30 points): Weight-decay Regularization and Regular- ization by Adding the Input Noise
D
y(x, w) = w0 + wi xi
i=1
(1)
together with a sum-of-squares error function of the form
N
ED (w) = {y(xn , w) − tn }2 .
n=1
(2)
Now suppose that Gaussian noise ϵi with zero mean and variance σ 2 is added independently to each of the input variables xi . By making use of E[ϵi] = 0 and E [ϵi ϵj ] = δij σ 2 , show that minimizing ED averaged over the noise distribution is equivalent to minimizing the sum-of- squares error for noise-free input variables with the addition of a weight-decay regularization term, in which the bias parameter w0 is omitted from the regularizer.
Problem 2 (30 points): Multiple Outputs
Consider a linear basis function regression model for a multivariate target variable t having
a Gaussian distribution of the form
p(t|W, Σ) = N(t|y(x, W), Σ) (3)
where
y(x, W) = W⊤ ϕ(x) (4)
together with a training dataset comprising input basis vectors ϕ(xn ) and corresponding target vectors tn with n = 1, . . . , N . Show that the maximum likelihood solution WML for the parameter matrix W has the property that each column is given by an expression of the form wML = ( Φ⊤ Φ)−1Φ⊤t, which was the solution for an isotropic noise distribution (see Section 3.1.5 on page 146 in the Bishop’s book, Pattern Recognition and Machine Learning).
Note that this is independent of the covariance matrix Σ . Show that the maximum likelihood solution for Σ is given by
N
Σ = tn − WM(⊤)L ϕ(xn ) tn − WM(⊤)L ϕ(xn ) ⊤ .
n=1
(5)
Problem 3 (40 points): Probabilistic Generative Classification Model for K Classes
(i) (20 points) Consider a probabilistic generative classification model for K classes defined by prior class probabilities p(Ck ) = πk and general class-conditional densities p(ϕ|Ck ) where ϕ is the input feature vector. Suppose we are given a training dataset {ϕn , tn } where n = 1, . . . , N , and tn is a binary target vector of length K that uses the 1-of-K coding scheme, so that it has components tnj = Ijk if pattern n is from class Ck (Ijk = 1 if j = k and 0 otherwise). Assuming that the data points are drawn independently from this model, show that the maximum-likelihood solution for the prior probabilities is given by
πk =
(6)
where Nk is the number of data points assigned to class Ck .
(ii) (20 points) Consider the classification model of Problem (i) above and now suppose that the class-conditional densities are given by Gaussian distributions with a shared covariance matrix, so that
p(ϕ|Ck ) = N(ϕ|µk , Σ). (7)
Show that the maximum likelihood solution for the mean of the Gaussian distribution for
class Ck is given by
N
µk = 1 X tnk ϕn (8)
which represents the mean of those feature vectors assigned to class Ck . Similarly, show that the maximum likelihood solution for the shared covariance matrix is given by
K
Σ =XSk
k=1
where
N
Sk = 1 X tnk (ϕn − µk )(ϕn − µk )⊤ .
(10)
Thus Σ is given by a weighted average of the covariances of the data associated with each class, in which the weighting coefficients are given by the prior probabilities of the classes.
2022-06-27