闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

7CCSMML1

Machine Learning

Summer 2018

1. a. Emperor Tangerine has asked you to create a classiﬁer which can distin- guish “fake news” from ”the truth”. Testing your classiﬁer on a corpus provided by the Emperor’s communications director, you ﬁnd that it

gives the following results:

Example Classifer output Correct class

E01 fake news fake news

E02	fake news	the truth
E03	the truth	fake news
E04	fake news	the truth
E05	fake news	the truth
E06	fake news	fake news
E07	fake news	fake news
E08	the truth	fake news
E09	the truth	the truth
E10	the truth	the truth

Calculate the precision and the recall of your classiﬁer.

[6 marks]

b. What is the advantage of using the F1 metric over using precision and recall? Which would you choose to use here, and why?

[4 marks]

2. A single-layer neural network can be used to learn a linear decision boundary. Network weights can be updated by an iterative gradient descent optimiza- tion procedure.

a. Give the perceptron training rule (also known as error correction method) for a single-layer neural network. Your answer should include a brief explanation of each variable in the rule.

[2 marks]

b. An alternative to the perceptron training rule, is the delta learning rule. Give this rule for a single-layer neural network, and explain the beneﬁts of this rule over the perceptron training rule.

[3 marks]

c. In a multi-layered neural network, the backpropagation method can be used to update network weights proportionally to the error in a unit’s output.

Give the error functions for both a network output and hidden unit, when using a sigmoid threshold function. Brieﬂy explain how the error in a hidden unit’s output is computed.

[5 marks]

3. a. Explain what function approximation is in the context of reinforcement learning. Your answer should include an example of function approxima- tion.

[6 marks]

b. Why is function approximation used in reinforcement learning?

[4 marks]

4. Scientists have developed a blood test to identify a specific type of cancerous tumour. The test has two possible outcomes: positive and negative. This particular tumour is found in only 4% of the population. The test returns a correct positive result in 80% of the cases in which the tumour is actually present and a correct negative result in 90% of the cases in which the tumour is absent. In other cases, the test returns the opposite result, i.e., the P (Positive l Not tumour) is 10%.

P (Tumour) = 0.04

P (Positive l Tumour) = 0.80 P (Positive l Not tumour) = 0.10

P (Not tumour) = 0.96

P (Negative l Tumour) = 0.20

P (Negative l Not tumour) = 0.90

a. A blood sample from a new patient is tested and the results are positive. Find the maximum a posteriori (MAP) hypothesis and state whether or not the patient should be diagnosed with cancer. (Tip: converting deci- mal values to fractions will make the math operations easier to perform, e.g., 4% = )

[5 marks]

b. Find the posterior probabilities: P (Tumour l Positive) and P (Not tumour l Positive)

[5 marks]

5. a. You are implementing a 3-nearest neighbour classiﬁer that will work with the following data:

Instance Features Class

x1 x2 x3

X2 2 2 1 C1

X3 1 2 2 C1

X4 2 5 3 C2

X5 1 3 2 C1

X6 3 4 1 C2

The distance metric that you are using with this classiﬁer is the Man- hattan distance. For two examples, i with features (x1(i), x2(i), x3(i)) and j with features (x1(i), x2(j), x3(j)), the Manhattan distance between them is:

d乞孑≠aηcé = lx1(i) - x1(j)l + lx2(i) - x2(j)l + lx3(i) - x3(j)l

Which class would your classiﬁer allocate to an instance with: x1 = 3, x2 = 1, and x3 = 3?

For full marks you should explain why you get this answer.

[6 marks]

b. On testing, you begin to suspect that your 3-nearest neighbout classiﬁer is overﬁtting. How would you modify your classiﬁer?

[4 marks]

6. Support vector machines are a commonly used classiﬁcation method.

a. Give the primal optimization problem for a support vector machine clas- siﬁer, and give a brief explanation of the initiation behind it.

[4 marks]

b. Explain what a kernel function is in the context of learning non-linear decision boundaries, and explain how it can be used in combination with support vectors to make eﬃcient classiﬁers.

[6 marks]

7. You are given the following training data: Instance x y

E1 2 1.5

E2 1 1.5

E3 3 2

and asked to build a regression model hw (x) that can be used to predict the value of y which corresponds to any given value of x.

a. Write down the equation for a suitable regression model, explaining why this model is suitable.

[4 marks]

b. Write down the rules for updating the model in your answer to part (a) using stochastic gradient descent. Demonstrate the use of these rules by using stochastic gradient descent in conjunction with the training data given above.

The learning rate should be set to 0.1, and any weights should be ini- tialised to 0.

[6 marks]

8. Conﬁdence-Based Autonomy is a form of Learning from Demonstration.

a. Explain how a learner acquires a policy using Conﬁdence-Based Auton- omy.

[6 marks]

b. What is “conﬁdence” in the context of Conﬁdence-Based Autonomy, and how is it used?

[4 marks]

9. a. Write down the update rule for Q-learning and explain each element of the rule.

[6 marks]

b. A key issue in reinforcment learning is the balancing of exploration and exploitation. Explain how this may be achieved in Q-learning.

[4 marks]

10. The table below shows the classiﬁcation of ten rock samples. The rocks were tested for hardness, lustre and crystal structure. Of the ten samples, only four were found to be the rare earth metal terbium.

Sample Hardness Lustre Crystal Structure Terbium

1 Hard Dull Flat sides Yes

2 Hard Dull Straight edges Yes

3 Soft Shiny Straight edges Yes

4 Hard Shiny Flat sides Yes

5 Soft Shiny Flat sides No

6 Hard Dull Straight edges No

7 Soft Dull Straight edges No

8 Soft Shiny Flat sides No

9 Soft Dull Straight edges No

10 Hard Shiny Straight edges No

a. Show the Bayesian network that a naive Bayesian learner will learn from the data given above. Use the following abbreviations: H-Hardness, L-Lustre, C-Crystal Structure and T-Terbium.

[4 marks]

b. Assuming the attributes are independent of each other, ﬁnd the proba- bility that a new sample is Terbium given that it is soft, shiny and has straight edges.

[6 marks]