Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

SPCE0038 - Machine Learning with Big Data

27-Apr-2022

Question 1

Consider a multi-class classification problem.

(a) Describe the one-versus-rest (OvR) (also called one-versus-all; OvA) strategy to perform multiclass classification given a binary classifier. [2 marks]

(b) Describe the one-versus-one (OvO) strategy to perform multiclass classification given a binary classifier. [2 marks]

(c) Given N classes, how many binary classifiers are required for the one-versus-rest (OvR) and one-versus-one (OvO) strategies. [2 marks]

(d) Specify the advantages and disadvantages of the one-versus-rest (OvR) and one- versus-one (OvO) strategies. [4 marks]

(e) Describe how a neural network can be used to perform multi-class classification. Include a diagram of the neural network architecture to support your discussion (for the purpose of your diagram, assume a fully-connected network with 5 inputs, 2 hidden layers and, 4 classes to be classified). [6 marks]

(f) Define the softmax function mapping inputs aj  to outputs pj , for j = 1, . . . , n. [1 mark]

(g) Show the outputs of the softmax function pj  satisfy the following properties:

(i)     j(n)11 pj  = 1;

(ii) 0 = pj  = 1 for all j . [3 marks]

Question 2

You have started a position as a Data Scientist and have been tasked with designing and building a simple neural network model using TensorFlow and Keras for an image classification problem. The model is to be built from scratch with no external data, so do not consider transfer learning.

(a) Keras includes three different core APIs.   Describe each API and discuss its advantages and disadvantages. [6 marks]

(b) Which API would you use for the problem at hand (as discussed in the beginning of the question) and why? [2 marks]

(c) Describe the steps required in this project that would need to be implemented using Keras to develop a machine learning model that can be deployed in pro- duction. Simply describe the steps conceptually; you do not need to provide any pseudo code. [12 marks]

Question 3

In a decision tree define the Gini threshold measure.  Describe what value of 0

and value of 0.5 mean. [5 marks]

(b) What is an alternative to using the Gini measure?  Define this, and comment on what effect this may have on a decision tree compared to the Gini measure. [5 marks]

(c) Describe and define the CART algorithm,  including what the acronym CART stands for. Define the cost function that CART uses. [5 marks]

(d) List five ways in which decision tree classification may be regularised. [5 marks]

Question 4

You are working on a project involving data from a number of sources.  Before you build your model, you must prepare the data to make it easier to work with.

(a) Give 2 reasons why you may need to perform some preprocessing rather than work directly with the raw data. [2 marks]

(b) List 2 data cleaning practices that may result in having fewer data points after preprocessing. [2 marks]

Some of the data you need is located in a database and must first be extracted.

(c) Briefly describe how the data in a relational database is structured. [2 marks]

(c) What is the standard language for interacting with relational databases, and which keyword (or kind of query) is used to extract a subset of entries? [1 marks]

You decide to use DVC to automate the different steps of your analysis. With this in mind, you have written the following dvc .yaml file:

s t a g e s :

load _ a nd _ split :

cmd :   python   p r e p a r e _ a l l _ d a t a . py

deps :

_  p r e p a r e _ a l l _ d a t a . py

o u t s :

_  d a t a _ t r a i n . j s o n

_  data _ test . j s o n

t r a i n :

cmd :   python   model . py

deps :

_  d a t a _ t r a i n . j s o n

_  model . py

o u t s :

_  f i t t e d _ p a r a m e t e r s . j s o n

e v a l u a t e :

cmd :  deps :

python  compute _ metrics . py

_  f i t t e d _ p a r a m e t e r s . j s o n

_  compute _ metrics . py

_  d a t a _ t e s t . j s o n

m e t r i c s :

_  r e s u l t s . j s o n

(e) Describe in words the pipeline specified in the above file (you can assume that the names of stages and files are representative of what they do).  Make sure to specify which files you are referring to. [8 marks]

(e) Instead of writing the le by hand, you could create it using DVC commands. Give the command that would create and execute the train section of the above file. Include all the required options and arguments. [5 marks]