Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

CS365: Foundations of Data Science

HW 2

Due: Friday 10/24/2025 @ 11:59pm EST

Disclaimer

I encourage you to work together, I am a firm believer that we are at our best (and learn better) when we communicate with our peers. Perspective is incredibly important when it comes to solving problems, and sometimes it takes talking to other humans (or rubber ducks in the case of programmers) to gain a perspective we normally would not be able to achieve on our own. The only thing I ask is that you report who you work with: this is not to punish anyone, but instead will help me figure out what topics I need to spend extra time on/who to help. When you turn in your solution (please use some form of typesetting: do NOT turn in handwritten solutions), please note who you worked with.

Question 1: Matrix Derivatives (30 points)

Differentiating scalars is great, but what happens if our objective function is written using linear alge-bra? The objective may still be differentiable, and so to optimize it we would still need to differentiate it. But how can you differentiate linear algebra equations? In this question you will be deriving some of the derivative rules for matrix equations.

Remember that matrix equations are used to group together lots of smaller, scalar equations. Differ-entiation also follows a similar rule: until you know the matrix derivative rules as well as the scalar ones, the procedure for differentiating a matrix equation is as follows:

1. expand the matrix equation into its scalar form. This equation may be huge and will probably involve lots of summations.

2. differentiate the giant scalar equation.

3. repackage the giant derivative equation back into matrix form.

Please follow this procedure to derive the matrix derivative rules for the following matrix equations:

Question 2: Multivariate Gaussian Mixture Models (35 points)

A Gaussian mixture model is a instance of a very specific application of expectation maximization. The probelm in lecture we’ve been applying expectation maximization to has been placing examples into exactly one of k distinct buckets, where each bucket is controlled with a (potentially different shape from the others) probability distribution. A Gaussian mixture model makes a further assumption to this problem: it assumes that every bucket is controlled by a separate gaussian distribution. The pdf for a 1-dimensional gaussian is:

In your labs you have solved this problem for a 1-dimensional gaussian distribution by first deriving the log-likelihood of a set of samples (denoted as X) as:

and then deriving:

In lecture we already applied the e-step of expectation maximization to maximize Q for the alignment (i.e. maximizing P r[zij = 1|xi]. The solution was to calculate:

After plugging this solution back into Q, we arrived at the m-step of expecation maximization where we needed to figure out π and µ, σ2 for every cluster. We took our modified Q:

found the equations for set them equal to zero and solved. We arrived at the following solutions:

In this question, what if each sample is multi-dimensional instead of a scalar? In this case each sample becomes a vector in d-dimensional space and will be denoted as . Gaussian exist for multi-dimensional samples, but the pdf changes to now have a mean vector and a covariance matrix Σ.

The new pdf looks like:

where is the inverse, |Σ| is the determinant of covariance matrix Σ, and k is the dimensionality of . Most of the math stays the same (γij and stay the same) and we arrive at the following Q:

Please find the MLE estimates for:

Question 3: Singular Value Decomposition) (35 points)

Consider matrix A and its svd:

1. (15 points) Show that for a square, symmetric matrix M, any two eigenvectors with distinct eigenvalues λ1, λ2 are orthogonal (i.e. the inner product is 0). You will need the following lemma:

2. (10 points) Show that and are symmetric.

3. (5 points) Derive an expression that relates singular vectors and singular values of A to the eigenvectors and eigenvalues of

4. (5 points) Derive an expression that relates singular vectors and singular values of A to the eigenvectors and eigenvalues of