闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

School of Computer Science

Machine Learning and Intelligent Data Analysis

Mock Examination

Machine Learning and Intelligent Data Analysis

Question 1 Linear Regression

(a) The expected value of the least squares loss-function for regression is

where h is the true function underlying the data, f is an estimate of h and σ is the standard deviation of the data.

Explain the terms in this expression and their implications for learning.

(b) The graph below shows the ﬁt of a curve to a set of points generated by a noisy underlying process. The independent variable is x and the dependent variable is y . This ﬁt was obtained by solving the normal equations Φ yT = ΦT Φw for the model weights w = (w0 , . . . , w9 )T , where Φ is the basis matrix and y = (y1 , . . . , y11 )T .

Comment on this ﬁt with reference to Equation (1).

Suggest three ways by which the result could be improved, explaining your reasoning.

(c) Calculate the likelihood p(t|x) that a data point X = (x, t) = (3 , 10) was generated by a function f (x) = 2x + 3 + e, where e is drawn from a normal distribution with mean 0 and standard deviation σ = 1.

Question 2 Clustering and Dimensionality Reduction

(a) In the following 2D dataset, two clusters of data points can be visually observed. For

this dataset, brieﬂy explain your analysis of whether k-means clustering algorithm (with k = 2) will give the same clustering results as the visual observation of clusters (observable by naked eye). If your answer is yes, describe how k-means will achieve this. If your answer is no, describe why not and suggest an alternative approach to handle such kind of data for clustering. Justify your reasoning.

(b) The distance matrix below provides the distance between each pair of objects W, X,

Y, and Z. Using hierarchical agglomerative clustering with complete link, write the cluster formation process, including (i) each merge of objects and/or clusters, (ii) the height after each merge, and (iii) the ﬁnal dendrogram.

(c) Suppose a 10-dimensional dataset contains 100 measurements. From its covari- ance matrix, the following ten eigenvalues have been obtained. Based upon these eigenvalues, comment about the true dimensionality of this dataset. Justify your answer.

Further, if we wish to project this data into 2-dimensions, indicate which two eigen- values correspond to the eigenvectors we would like to choose. Explain why you made this choice.

(d) One possible approach to searching a large linked set of documents is to combine a measure of document similarity such as TF-IDF similarity with a measure of a page’s importance such as that provided by PageRank. Suggest three ways in which this could be done and discuss the advantages and disadvantages of each of them.

Question 3 Classiﬁcation

(a) The optimisation problem for Soft-Margin Support Vector Machines is deﬁned as

follows:

Subject to:

where w and b are parameters of the model, ξ are the slack variables, C is a hyper- parameter, N is the number of training examples, y(n) is the output value of example n , x(n) is the vector of input values of example n and f (.) is the function describing the model.

Suppose someone decides to change this optimisation problem to the following: