Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

DTS304TC Machine Learning

Coursework  Assessment Task 1

Submission deadline: 23:59, March 31st, 2023

Percentage in final mark: 50%

Learning outcomes assessed: A, B

Individual/Group: Individual

Length: The assessment has a total of 8 questions which gives 100 marks. The submitted file must be in pdf format.

Late policy: 5% of the total marks available for the assessment shall be deducted from the assessment mark for each working day after the submission date, up to a maximum of five  working days

Risks:

•  Please read the coursework instructions and requirements carefully. Not following these instructions and requirements may result in loss of marks.

•  The formal procedure for submitting coursework at XJTLU is strictly followed. Submission    link on Learning Mall will be provided in due course. The submission timestamp on Learning Mall will be used to check late submission.

__________________________________________________________________

Question 1- Agents (10 marks):

Consider the task of assembling a bookcase from its components.

(a) Use PEAS to specify the parameters for the task environment.

(i) Performance measure:

(ii) Environment:

(iii) Actuators:

(iv) Sensors: (4 marks)

(b) Specify three properties of the environment. Explain these properties in the context

of the task (no marks will be given for an absent or incorrect explanation). (3 marks)

(c) Determine which of the following agent types would you employ: Simple reflex, Model    based, Goal based or Utility based. Explain why in the context of the task, and list your assumptions (no marks will be given for an absent or incorrect                 explanation)(3 marks)


Question 2- Backtracking (10 marks):

Consider the missionaries and cannibals problem:

Three missionaries and three cannibals come to a river. There is a boat on their side of the river that can be used either by one or two persons. How should they use this boat to cross the river in such a way that cannibals never outnumber missionaries on either side of the river?

The diagram below shows a sequence of partial solutions produced by the Backtrack algorithm, where each state marked with X is a state where the algorithm has backtracked. In these solutions, cannibals are labeled cann”, missionaries are labeled miss”, the banks are labeled L (left) and R (right), and the position of the boat is indicated with the label BOAT. The objective is to reach side R.

Label each state marked with X with the reason for backtracking, using one or more of the following  labels:  DEADEND,  BOUND  REACHED,  PREVIOUS  STATE,  NO  MORE  APPLICABLE ACTIONS.


Question 3- DFS/BFS (10 marks):

Use the following (fully expanded) search tree to indicate the order in which nodes are          expanded for different types of search. Assume that A is the start node and G (double boxed) the only goal node. How many nodes are there in the resultant search tree?

(a) Depth-first search

List the nodes according to their order of expansion.

List the nodes in the final search tree (without the nodes deleted by the algorithm). (3 marks)

(b) Best-first greedy search

List the nodes according to their order of expansion. For each expansion, list OPEM (with the nodes in the correct order) and CLOSED.

List the nodes in the final search tree. (7 marks)

Question 4- Algorithm A/A* (15 marks):

The following (fully expanded) search tree indicates the cost of reaching each node  (g). the     estimated cost of reaching a goal node from each node  (h), and their sum (f). For example, the cost of reaching node C is 13, and the estimated cost of reaching the goal node from node C is 17. A is the start node, and nodes with 0 cost estimates  (h)are goal nodes.

(a) Draw the search tree that would be generated by Algorithm A, using roman numerals to indicate the order in which nodes are expanded. Upon completion, mark the goal node. After each expansion, provide the following information in the table below:

•    the node that was expanded, and

•    the nodes in the horizon (OPEN) in order of merit, i.e., in the order in which they will

be expanded, together with their f value. If several nodes have the same merit, order them from left to right. (13 marks)

 

(b) Did algorithm A find the optimal answer, and is the h function admissible? Why or why not? Illustrate your answer with an example from the provided search tree    (no marks will be given for an absent or incorrect explanation). (2 marks)

Question 5- Simulated Annealing/Genetics Algorithms (10 marks):

(a) In Simulated Annealing, if T2 > T1, is the probability of adopting a new worse state higher in T2 or in T1? Why? (no marks will be given for absent or incorrect explanations(4 marks)

(b) Does the Simulated Annealing algorithm given in class always terminate? Why or why not? (no marks will be given for absent or incorrect explanations(3 marks)

(c) A genetic algorithm is used to evolve a binary string of length n to one where the sum (from left to right) of the last four genes is equal to 1. The initial population is a randomly generated  set of binary strings of length n, such as those shown here:

00110001

01011101

11101111

Give a suitable fitness function for this problem. (3 marks)


Question 6- MinMax (10 marks):

Consider a hypothetical two-person fully-observable game with branching factor 2. It is MAX’s turn to play, and he is able to evaluate his position to depth 4.

The following is a list of values for the right-hand side of the game tree, should they ever need to be evaluated. The left-hand side (comprising 8 nodes) has already been evaluated and the α value of the root MAX node is 5.

(a)    In the partial game tree below, MAX is presented by a square and MIN by a circle.

Complete the game tree by conducting an α and β as appropriate), the updates performed on the backup values, and the α and β cut-offs you have performed. (8 marks)

(b)    What is the best move for MAX and what is its backed up value? (2 marks) 

Question 7: Probability & Bayesian Networks (18 marks):

1)       Creutzfeldt-Jakob is a rare disease: about 1 in 1,000 people get it. Doctors have found that 90% of the people with Creutzfeldt-Jakob disease are hamburger eaters, while 1% of people who don't have Creutzfeldt-Jakob disease eat hamburgers. These are fictitious figures.

Let CJ represent the variable corresponding to having Creutzfeldt-Jakob disease. Let HE represent the variable corresponding to being a hamburger eater.

(a)      Fill in the following probabilities:

    Pr(+CJ)

    Pr(+HE|+ CJ)

     Pr(-HE |-CJ) (6 marks)


(b)       What is the probability that a hamburger eater will have Creutzfeldt-Jakob disease?

Show the formula before you plug-in numbers. If you don't have a calculator, you can still estimate the answer. (4 marks)

(c)       Should you stop eating hamburgers? Why or why not? (no marks will be given for an absent or incorrect explanation). (2 marks)

2)        Consider a probabilistic model (Bayesian Network) for determining a patients risk of

heart disease. The model consists of the following 5 random variables:

H: Heart Disease = {present, absent}

B: Blood Pressure = {high, low}

E: Exercise = {frequent, infrequent}

G: Gender = {male, female}

S: Smoking = {smoker, non-smoker}

The model admits the following factorisation of the joint distribution for the 5 variables: P(H, B,

E, G, S) = P(H|B, E, G, S)P(B|E, G)P(E)P(G)P(S)

a)        If we don’t know anything about the patient, is Smoking (S) independent of Exercise (E) and why? (1.5 marks)

b)        Suppose you are told that the patient has high Blood Pressure (B). Is Smoking (S) independent of Exercise (E)? (1.5 marks)

c)        And if you are told that the patient has Heart Disease (H) are S and E independent? (1.5 marks)

d)       You know from your medical training that frequent exercise lowers the risk of heart  disease while smoking increases it (irrespective of gender). If you are told that a       particular patient with the disease was NOT a smoker, what can you infer about their level of exercise? (1.5 marks)

Question 8: Decision Tree (17 marks):

You regularly take the Suzhou to Taicang shuttle bus. You want to build a Decision Tree to predict whether you will arrive on time, and you have gathered the following set of six training instances:

Weather

Time

Arrival

sunny

morning

OnTime

rainy

afternoon

OnTime

cloudy

morning

Late

cloudy

afternoon

OnTime

rainy

morning

Late

sunny

afternoon

OnTime

The formula for Entropy is:

In your answers, you should use the following abbreviations:

Weather

Time

Arrival

sunny s

morning m

OnTime OT

Rainy r

afternoon a

Late L

Cloudy c

 

 

a.      What is the entropy of Arrival? Spell out the formula for this calculation before you plug-in numbers. Use the given variable names and values to indicate which parameter you are calculating. (3 marks)

b.       Calculate the Information Gain of splitting on the attribute Time at the root of the Decision Tree. Spell out the formula for this calculation before you plug-in numbers. Use  the  given  variable  names  and  values  to  indicate  which  parameter  you  are calculating. (5 marks)

c.       Given that the information gain of splitting on Weather for these training instances is 0.251, then based on the information gain of Time that you calculated in item (b), which attribute should be selected first to expand the Decision Tree? Why? (Answers with no explanations or incorrect explanations will receive no marks.) (3 marks)

d.       Draw a Decision Stump that splits only on Time, indicating the number of members of each class in each leaf. (3 marks)

e.      What is the accuracy of this Decision Stump, assuming that the majority class is used to make a prediction? (3 marks)