闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Artiﬁcial Intelligence (First Year)

Main Summer Examinations 2019

Question 1 Search and Optimisation

Assume that you are developing an algorithm to ﬁnd the lowest cost path to move an army from a starting to a goal position in a strategy game. The ﬁeld is organised as a grid, where the positions whose coordinates (x , y) are (3,2) and (3,3) have earth terrain; (1,1), (2,1), (3,1) and (2,2) have sand terrain; and (1,2), (1,3) and (2,3) have water terrain:

Each move can take the army one position up, down, left or right on the grid, so long as this does not move the army outside the grid. Independent of the type of terrain of the current position of the army, the cost of moving to an earth, sand and water position is 1, 3 and 10, respectively. For example, the cost of moving left from (3,2) is 3.

(a) Your army is in position (3,1) and you wish to reach the goal position (1,3). Assume

that you have decided to use a breadth-ﬁrst search algorithm to solve this problem, respecting the following rules:

. A state in the state space graph is identiﬁed by the (x , y) coordinates of the

current position of the army, meaning that there are 9 possible states.

. Do not place children in the frontier if their corresponding state is already in

the frontier or list of visited nodes.

. Stop when you place in the frontier a node which contains the goal state.

. When deciding which node to visit, if there is a draw, choose to visit the node

with the smallest x-coordinate ﬁrst. If this still results in a draw, choose to visit the node with the smallest y-coordinate ﬁrst. For example, if there is a draw between nodes whose states are (3,1) and (2,2), visit (2,2) ﬁrst. If there is a draw between (3,1) and (3,2), visit (3,1) ﬁrst.

Write down the following information:

. Search tree produced by breadth-ﬁrst search. Indicate which nodes are visited

nodes and which nodes are in the frontier when the algorithm terminates.

. Sequence of nodes visited by breadth-ﬁrst search. Note: you can identify a

node through its state.

. Sequence of states that compose the path retrieved as a solution by breadth-

ﬁrst search.

[8 marks]

(b) Is breadth-ﬁrst search a good algorithm for this problem? Justify your answer.

[6 marks]

(c) Consider that you wish to use Hill Climbing to solve this problem. For that, you need to provide a problem formulation.

. Specify the design variable of your problem formulation.

. Explain your design variable by giving two examples of candidate solutions. If

infeasible solutions are possible, give one example of feasible, and one example of infeasible solution, explaining the reason for them to be feasible / infeasible. If no infeasible solutions exist given your problem formulation, give two examples of feasible solutions, and explain why no infeasible solutions exist.

[6 marks]

Question 2 k-Nearest Neighbours and Na¨ıve Bayes

(a) Consider the problem of predicting the best hashtag to be associated to a tweet. To

solve this problem, you have access to incoming tweets that can be used as training examples. Each hashtagged tweet is a training example. Every second, on average, around 6,000 tweets are tweeted on Twitter, and most of them use hashtags. Each tweet is described by a set of input attributes and one output attribute. There is one categorical input attribute for each possible word that can appear in a tweet. Each input attribute can assume values true or false, representing whether or not the corresponding word appears in the tweet. The output attribute is a categorical value corresponding to the ﬁrst hashtag that appears in the tweet. Other hashtags are ignored.

. List one advantage and one disadvantage of using k-Nearest Neighbours for

this speciﬁc problem and explain your answer.

. List one advantage and one disadvantage of using Na¨ıve Bayes for this speciﬁc

problem and explain your answer.

[6 marks]

(b) Assume a much reduced ﬁctitious version of this problem where you have received

only six training examples so far and you have only two input attributes. The fre- quency tables created for Na¨ıve Bayes without Laplace Smoothing are shown below, where x1 and x2 are the input attributes and y is the output attribute:

Frequency table for x1	y=birthday	y=contest	y=health	total
x1 = true	2	2	0	4
x1 = false	0	0	2	2
total	2	2	2	6

Frequency table for x2	y=birthday	y=contest	y=health	total
x2 = true	2	0	1	3
x2 = false	0	2	1	3
total	2	2	2	6

Recall that Na¨ıve Bayes uses the following equation to make predictions:

P(c>f1 , _ _ _ , fn ) = αP(c) P(fi >c)

i=1

where

. α = 1/β

. β = c│Y ╱ P(c) P(fi >c)、

. c is an output value;

. Y is the set of possible output values;

. fi is the value of input attribute i; and

. n is the number of input attributes.

Show the step-by-step calculations of the probabilities of a new tweet with input attributes (x1 = true, x2 = true) having birthday, contest and health as best hashtags, respectively, using Laplace Smoothing. What hashtag would Na¨ıve Bayes predict and why?

[8 marks]

(c) Consider now that you wish to predict more than one possible hashtags that could be associated with a given tweet. This means that you can have more than one output value associated to each tweet. Discuss how you would use Na¨ıve Bayes to predict more than one possible hashtag. Your discussion should include the following items:

. How would you create frequency tables?

. How would you use the Na¨ıve Bayes equation to give predictions?

[6 marks]

Question 3 Linear Regression and Logistic Regression

(a) The cost of a hypothesis function parametrised by z is given by the following equa-

tion:

z2 _ 12z + 2

What is the value of the parameter z at which the cost is minimum?

[4 marks]

(b) The cost function for Logistic Regression is given by the following equation:

Cost(w) = _ ┌ i 1 y(i)l og(hw (x(i))) + (1 _ y(i))l og(1 _ hw (x(i)))┐

where w represents the weights of the hypothesis function h, and y(i) and x(i) are the input and output values of a given example.

Derive this expression from the version which shows the cost for each case y = 0 and y = 1 separately. Additionally, detail why this is a reasonable cost function. You might ﬁnd it easier to use the separated version to show this by analysing the values of _l og(x) and _l og(1 _ x).

[8 marks]

(c) The Hypothesis function for Univariate Linear Regression is y = w0 + w1x . The cost function associated with this hypothesis function h, parametrised by some w0 and w1 , is i (y(i) _ hw (x(i)))2 , where y(i) and x(i) represent the output and input values of the ith training example.

(i) Write out and provide an explanation for the general form of the hypothesis function and cost function of Linear Regression in two variables.

(ii) Similarly, write out and provide an explanation for the general form of the

hypothesis function for Univariate Non-linear Regression. Assume that the non-linear hypothesis function is a quadratic.

[8 marks]

Question 4 Neural Networks

(a) Describe the meaning of the terms “overﬁtting” and “underﬁtting” in the context

of neural networks and additionally detail the impact of increasing the number of lay- ers, increasing the regularization and increasing the dropout on these phenomenon.

[4 marks]

(b) Consider the graphical illustrations of two datasets below. You are required to train a

neural network to diﬀerentiate the two diﬀerent classes. For each dataset, describe the architecture of and provide an illustration of the neural network you will use along with how you will go about training and testing your neural network.

[8 marks]

(c) Consider the following Neural Networks each of which perform the associated logical operation shown in the Truth Table.

x1 OR x2

not(x1 ) AND x2

x1XORx2

The activation function of the above Neural Network is the sigmoid function given by the equation . You may assume that:

sigmoi d (x) < 0; ←-一 x · _4

sigmoi d (x) < 1; ←-一 x ● +4

Create a neural network to perform the non-linear separable operation XOR whose Truth Table is given above. Draw a table to illustrate the input and output for each of your nodes that clearly shows why your networks performs this operation.

[8 marks]