Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

2020 Semester 1 Exam

Question 1

1 pts

Which of the following is the best example of a regression problem?


Predicting whether a customer will purchase an item at an online retailer based on their browsing history.

Predicting the likelihood that an Instagram user will like a given photo.

Predicting which sports team will win an upcoming match.

Predicting how much a customer will spend at an online retailer in the next year based on their past purchase history.


Question 2

1 pts

Which of the following is the best example of a classification problem?

Predicting whether a customer will like an item bought in an online shop based on their browsing history.

Estimating the total amount of time the average user will spend using an application on a mobile device.

Predicting the number of online views an article on the Age website.

Predicting how much a customer will spend on alcohol in the next three weeks based on their credit card transaction history.



Question 3

4 pts

Consider the following regular expression meta operators:

( ) [ ] { } . * + ? ^ $ | \

For each of the following, give a couple of examples of strings which the regular expression would match. Describe (colloquially, in a manner that a non-technical person would understand) the set of strings that the pattern is designed to match. Each regular expression is enclosed in the pair of '/' characters below. [2 marks each]

(a) /^[A-Za-z][a-z]*$/

(b) /\s(\w+)\s\1/

Edit View Insert

12pt Paragraph



Question 4

4 pts

We have a multidimensional dataset with 4 numerical attributes and are required to thoroughly assess correlations among the attributes with visualisation. There are several visualisation techniques we can use to assess correlations.

(1) What aspects of correlation could we assess? [1 mark]

(2) From the visualisation techniques we introduced in this subject, which two are most suitable for assessing correlations in this dataset? [1 mark]

(3) Nominate a best choice of visualisation from the two most suitable ones and justify how you have come to the choice compared to the other option. [2 marks]

Edit    View Insert Format Tools Table

12pt Paragraph

Question 5

5 pts

It is good practice to use n-fold cross-validation during datamining tasks as this     produces a generally less biased comparison between algorithms.  Another data   wrangler is working on a data set of 1,200 Australian hospital patients that need to be classified into 3 classes as follows:

Has COVID-19

Does not have COVID-19 but has a physical injury

Has a condition other than COVID-19 or a physical injury

The classification will be based on 8 features using supervised approaches. The data wrangler decides to use 24-fold cross-validation as 3 x 8 = 24.

Briefly explain what n-fold cross validation is. [1 mark]

Next, discuss what do you think about his choice of n = 24, will this achieve a good outcome? [2 marks]