关键词 > EECS4404B/5327B

EECS 4404B/5327B 3.0 (F) 2022-23 Introduction to Machine Learning and Pattern Recognition Assignment 1

发布时间：2022-10-08

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

EECS 4404B/5327B 3.0 (F) 2022-23

Introduction to Machine Learning and Pattern Recognition

Assignment 1. Foundations

Please submit your assignment report electronically as a pdf file on eClass. Your report should be brief and well organized.

All submissions should be typeset using an appropriate math editor (e.g., LaTeX, Microsoft Equation Editor, MathType). Please make sure figures included in the report are properly formatted and captioned. Ensure that all graphs have appropriate axis labels and fonts are properly sized. Embed figures as vector graphics, not rasters (e.g., as .pdf files), to preserve resolution. You do not have to submit your code.

Part 1. Analysis (100 marks)

Linear Algebra

1) (10 marks) Use index notation to prove that tr(ABC) = tr(BCA) = tr(CAB).

2) (10 marks) Prove that a symmetric matrix is positive definite if and only if all of its eigenvalues are positive.

Probability

3) (10 marks) Show that the variance of a sum of two random variables X and Y is var[X + Y] = var[X] + var[Y] + 2cov[X, Y], where cov[X, Y] is the covariance between X and Y. (An important implication is that variances add if the two variables are independent.)

4) (10 marks) Prove that the mean and variance of the univariate normal p(x) = exp− are µ and σ2 , respectively.

Statistics

5) (10 marks) Prove that the maximum likelihood estimate of the variance σ2 of a univariate normal distribution over x is given by σM(2)L = (xn − µML )2 , where µML is the maximum likelihood estimate of the mean.

6) (10 marks) Prove that the expected value of this maximum likelihood estimate is E σM(2)L = σ2 , and thus that this estimate is biased.

7) ( 10 marks) Given one-dimensional i.i.d. training data x = {x1 , …xN } , suppose you know that the data are normally distributed with a variance of σ2 . Given a normal prior for the mean µ ~ N ( µ0 ,σ0(2)) , the posterior distribution over µ is also normal: p ( µ | x ) ~ N ( µN ,σN(2) ) , where

a) (5327 only) Prove these expressions for the posterior mean and covariance.

b) Derive approximations for the posterior mean and variance that apply when

i) There are few training examples and their variance is high

ii) The prior is very weak

8) (10 marks) You are designing a video-based vehicle classification system for a toll highway. The traffic on this highway is generally 90% cars and 10% trucks. Cars are charged $5, while trucks are charged $20. If you overcharge a car, there is a 50% chance that this will be discovered. Reimbursing the driver involves a lot of paperwork and time, and the net cost to your department is estimated to be $100. If your system processes an image and computes the truck likelihood to be twenty times that of the car likelihood, what should your system do? Use clear notation for this derivation, defining all terms.

Information Theory

9) (10 marks) The English alphabet contains 26 letters. If all letters were equally probable, how much information, in bits, would be transferred each time a letter is communicated?

10) (10 marks) The game of Scrabble involves 98 letter tiles (ignoring blanks) distributed as follows:

A-9, B-2, C-2, D-4, E- 12, F-2, G-3, H-2, I-9, J- 1, K- 1, L-4, M-2, N-6, O-8, P 2, Q- 1, R-6, S-4, T-6, U-4, V-2, W-2, X- 1, Y-2, Z- 1.

To begin the game, each player draws a letter tile, and the player that drew the lowest letter (closest to A) goes first. Suppose you draw the first tile. When you reveal it to your opponents, how much information will you be giving them, on average? How does this compare to your answer to 9 above?

Part 2. Coding (30 marks)

Bivariate Normal Model Estimation

Vision systems for autonomous vehicles use visual cues to identify the direction of the road. For roads that are relatively straight, the lines projecting from the road boundaries and lane markings meet in the image at the road vanishing point, which is informative about the tilt and pan angle of the vehicle relative to the road.

In this problem, we will explore a probabilistic approach to identifying this vanishing point. The approach uses a computer vision algorithm to identify line segments in the image, and then finds the points of intersection of all pairs of lines passing through these segments. (Intersections outside the image boundary are ignored.)

We can model these points of intersection as generated by a bivariate Gaussian model centred at the true vanishing point. Thus by fitting the model to the observed intersection data, we can estimate the vanishing point as the maximum likelihood estimate of the mean of this model.

Figure 1. Left: Detected line segments. Right:Points of intersection of extended lines passing through these line segments.

1) (10 marks) The provided ASCII file POIs contains the x and y coordinates of these points of intersection. Use these data to determine maximum likelihood estimates of both mean and covariance of a bivariate Gaussian model, assuming:

a) An isotropic model

b) An axis-aligned model

c) A full-covariance model

Report the mean and covariance matrix estimated in each of these cases.

2) (10 marks) Compute and report the log likelihood of the data under these three models.

a) Which model has the highest likelihood? Which has the lowest likelihood? Does this make sense? Why?

b) Moving forward, should you use the model with highest likelihood? What is one major risk in doing so?

c) (5327 only) Use a leave-one-out cross-validation strategy to compute the pseudo-log-likelihood of the data under each of these models. Based on this analysis, which model would you select?

3) (10 marks) Please plot the following information overlaid upon the road image provided.

a) Plot the intersection points as blue points.

b) On the same plot, plot the estimated mean as a yellow asterisk.

c) On the same plot, use the eigenvector decomposition of the covariance matrix to plot the standard deviation ellipse (isoprobability contour lying one standard deviation from the mean) for the three models in red, green and cyan, respectively.

d) Does the true vanishing point appear to lie within these covariance ellipses?