关键词 > ECS708P/U/D

ECS708P/U/D Machine Learning 2017

发布时间：2022-01-06

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

ECS708P/U/D Machine Learning

Question 1

a) Define the conditional probabilityp(A| B) in terms of the joint probability p(A, B) . You

may want to use a diagram/sketch.

[3 Marks]

b) Give the law of total probabilities, that is express p(A) using a set of events

B1 , B2 , , BN and the corresponding conditional (or joint) probabilities. What are the

conditions that need to hold?

[4 Marks]

c) Some emails received by users are spams containing viruses. You are building a

system to detect such illicit virus emails. You start by using the feature of whether or

not an email contains an executable attachment, as this an important datum indicating

whether the email in fact contains a virus. Data analysis suggests that 95% of virus

emails contain executable attachments, 90% of legitimate emails do not contain

executable attachments, and 2% of emails overall are viruses.

i. If your classifier scans an email an executable attachment, what is the

probability that the email in fact contains a virus?

ii. Comment on this value that you calculate. How does it compare with a

decision based only on the frequency of the emails that contain viruses?

iii. What is the probability that your classifier makes an error?

[12 marks]

d) Explain the difference between Maximum Likelihood (ML) and Maximum a Posterior

(MAP) methods of learning parameters from data X .

[6 marks]

[Q1 total 25 marks]

Question 2

a) Compare and contrast the goals in Linear Regression and Logistic Regression.

[4 marks]

b) The form of a linear regression model is y=wTx. Assuming the mean squared error

cost function, derive gradient descent updates for the weights w .

[9 marks]

c) What is the limitation of the networks without hidden layers, that was overcome by

Multilayer Networks? Is it is essential that the activation function is non-linear?

[6 marks]

d) Practical pitfalls with training neural networks include: (i) getting stuck in local

optima, (ii) underfitting or overfitting, (iii) bad learning rate. Explain what each of

these means.

[6 marks]

[Q2 total 25 marks]

Question 3

(a) Describe the difference between supervised and unsupervised learning. Give an

example of a real world problem that requires a supervised learning algorithm and an

example of a real world problem that can be solved with an unsupervised learning

algorithm. In both cases define the inputs and the outputs.

[8 marks]

(b) Describe in detail the steps of the K-means algorithm. Make sure that you define the

input to the algorithm, the output, and the dimensionality of all the variables that you

use.

[8 marks]

what coordinate descent (or coordinate optimisation) is. Using a sketch, show that this

general optimisation method is warranted to converge.

[4 marks]

(d) The K-Means algorithm converges to a local minimum. Describe a practical method to

deal with this problem. Can this method be used to determine the optimal value of K?

[5 marks]

[Q3 total 25 Marks]

Question 4

(a) With the help of a diagram explain the main principles of the first-order Markov Model.

In your answer explain any notation that you use. Explain what is meant by the term

‘’first-order”.

[6 marks]

(b) What are the differences between a Markov-Model and a hidden Markov model

(HMM)? What are the advantages of HMMs in comparison to Markov Models? Give an

example of an application (a toy example will suffice) where an HMM can be used but

a Markov Model cannot. In your answer, define the statesi , the symbolsvk , and the matrices A [aij ] and B [bjk ]

[6 marks]

and emission probabilities,

[6 marks]

(d) What is the evaluation problem? Using the results of (c) present a naïve algorithm that

solves the evaluation problem. What is the computational complexity of that algorithm?

Can this algorithm be used in practice?

[7 marks]

[Q4 total 25 Marks]