闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

SPCE0038: Machine Learning with Big-Data

Alternative Assessment 2020

Question 1

(a) Draw a diagram of the basic logistic unit that is used as the core building block of artiﬁcial neural networks. For simplicity you can ignore the inclusion of a bias term. Describe the components of your diagram in words. [3 marks]

(b) Specify the equations that deﬁne the output a of the logistic unit given the inputs 北j and weights θj .

Again, you may ignore a bias term for simplicity. [4 marks]

(c) Using your logistic unit as a base building block, draw a diagram of a fully connected, feed-forward artiﬁcial neural network with three layers (one input, one hidden and one output layer), three input units, three hidden units, and one output node. Again, you may ignore a bias term for simplicity. [4 marks]

(d) Specify the equations deﬁning the full artiﬁcial neural network of part (c), extending your equations given for a single logistic unit that you speciﬁed above in part (b). Again, you may ignore a bias term for simplicity. [6 marks]

(e) What typical cost functions are used to train neural networks for regression and classiﬁcation problems? Specify the corresponding cost function equations for targets yi) and predictions pi), where i denotes training instance and j the output node. [6 marks]

(f) Explain what is meant for a network to be deep? [1 marks]

(g) Why do deep networks provide a powerful representation framework? Include a discussion of the universal approximation theorem. [6 marks]

Question 2

Gradient descent algorithms take a step η in the direction of decreasing gradient, where the update of parameter θ is given by a form similar to

θ ← θ 一 ηVθC(θ),

where C denotes the cost function and VθC the gradient of the cost function with respect to θ . The variable η is often called the learning rate. Gradient descent based algorithms are often used to train deep learning models.

(a) Brieﬂy describe batch gradient descent and stochastic gradient descent at a conceptual level. [4 marks]

(b) Although stochastic gradient descent is often very eﬀective, why are alternative optimisation algorithms typically considered for training? [2 mark]

(d) Describe the Nesterov variant of the momentum algorithm, including the update equations. [3 marks]

(e) Explain the concept behind the AdaGrad algorithm and how this can help with training (no need to include update equations). [4 marks]

(f) Explain the concept behind the RMSProp algorithm and how this can help with training (no need to include update equations). [4 marks]

(g) Adam is the standard go-to algorithm for training deep networks. Explain the components of the

algorithms considered so far that are included in the Adam algorithm. [3 marks]

(h) Deep networks have very large numbers of parameters and so can be prone to overﬁtting. Explain the dropout regularisation technique to avoid overﬁtting. Support your explanation with a diagram. [7 marks]

Question 3

(a) Describe the knowledge based approach to artiﬁcial intelligence. [4 marks]

(b) Describe the machine learning approach to artiﬁcial intelligence. [2 marks]

(d) Brieﬂy describe supervised, unsupervised and reinforcement learning. [3 marks]

(e) For supervised learning, brieﬂy describe the diﬀerence between regression and classiﬁcation problems. [2 marks]

(f) Consider logistic regression for K classes, where the predicted probabilities for each class k are given by

pˆk = , with sk(x) =╱θ(k)、Tx,

for input x and parameters θ(k) (recall each θ(k) includes n features).

Consider the generalised cost function for logistic regression given by the cross entropy

m K

i=1 k=1

where i denotes training instance and m the total number of training instances. The target value of instance i for class k is denoted y

Show that the derivative of the cost function is given by

= i ╱pˆ) 一 yi)、x(i) .

Hint: For the term it may be convenient to consider the cases k = k\ and k k\ separately and then combine. Note also that yk = 1. [15 marks]

Question 4

(a) Explain the computational model of TensorFlow in terms of computational graph construction and execution. [3 marks]

(b) Explain the diﬀerence between TensorFlow Variable and Constant types. [3 marks]

(d) Explain autodiﬀ and its advantages. [4 marks]

(e) Consider the following TensorFlow code to set up a computational graph and execute it. Assume scaled housing data plus bias is an m × (n + 1) feature matrix and housing data target is an m × 1 target vector, where m denotes the number of training instances and n the number of features (n + 1 is the number of features when including a bias).

(i) Set up computational graph:

1 import t e n s o r f l o w as t f

2 reset _ graph ()

4 n _ epochs = 1000

5 l e a r n i n g _ r a t e = 0 . 01 6

7 X = t f . constant ( scaled _ housing _ data _ plus _ bias , dtype=t f . f l o a t 3 2 ,

8 name="X" )

9 y = t f . constant ( housing _ data _ target , dtype=t f . f l o a t 3 2 , name="y" )

11 theta = t f . V a r i a b l e ( t f . random _ uniform ( [ n + 1 , 1 ] , 一 1 . 0 , 1 . 0 ) ,

12 name=" theta " )

13 y _ pred = t f . matmul (X, theta , name=" predictions " )

14 e r r o r = y _ pred 一 y

15 mse = t f . reduce _ mean ( t f . square ( e r r o r ) , name="mse " ) 16

17 o p t i m i z e r = t f . t r a i n . GradientDescentOptimizer ( l e a r n i n g _ r a t e )

18 training _ op = o p t i m i z e r . minimize ( mse )

(ii) Execute:

1 i n i t = t f . g l o b a l _ v a r i a b l e s _ i n i t i a l i z e r () 2

3 with t f . S e s s i o n () as s e s s :

4 s e s s . run ( i n i t )

6 f o r epoch i n range ( n _ epochs ) :

7 i f epoch % 100 == 0:

8 p r i n t ( " Epoch " , epoch , " MSE =" , mse . e v a l ( ) )

9 s e s s . run ( training _ op )

11 best _ theta = theta . e v a l ()

What machine learning problem does this TensorFlow code solve? What optimisation algorithm is used? [4 marks]

(f) Write code to solve the problem given in part (e) using mini-batch gradient descent. You may ﬁnd it helpful to base your answer on the code given in part (e) and then revise it where necessary. Assume you have available a function fetch batch to fetch each mini-batch, with signature speciﬁed below:

1 def fetch _ batch ( epoch , batch _ index , batch _ size ) :

2 . . .

3 r e t u r n X _ batch , y _ batch [12 marks]

Question 5

(a) Describe what Principal Component Analysis (PCA) is. [3 marks]

(b) Deﬁne the explained variance ratio. [2 marks]

(d) Deﬁne the process of Local Linear Embedding (LLE). [4 marks]

(e) In the ﬁrst step of LLE, for a set of training instances xi, with k nearest neighbours LLE will ﬁrst reconstruct the xi as a linear function of these neighbours. Write down an equation that would describe this process, and any normalisation that is applied. [8 marks]

(f) The second step of LLE is to map the training instances into a d-dimensional space while preserving local relationships as much as possible. If zi is the d-space equivalent of xi then describe the condition that must be met. [8 marks]

2023-04-26

Java

物理(Physical)

LINUX

C++

Python

Processing

sas

ios

maths

maple

C语言