COMP20008 Semester 1 Exam 2020
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
2020 Semester 1 Exam
Question 1 |
1 pts |
|||
Which of the following is the best example of a regression problem?
Predicting how much a customer will spend at an online retailer in the next year based on their past purchase history. |
Question 2 |
1 pts |
|||
Which of the following is the best example of a classification problem?
Predicting how much a customer will spend on alcohol in the next three weeks based on their credit card transaction history. |
Question 3 |
4 pts |
||||||
Consider the following regular expression meta operators: ( ) [ ] { } . * + ? ^ $ | \ For each of the following, give a couple of examples of strings which the regular expression would match. Describe (colloquially, in a manner that a non-technical person would understand) the set of strings that the pattern is designed to match. Each regular expression is enclosed in the pair of '/' characters below. [2 marks each] (a) /^[A-Za-z][a-z]*$/ (b) /\s(\w+)\s\1/ |
|||||||
Edit View Insert
12pt Paragraph
|
Question 4 |
4 pts |
||||||||
We have a multidimensional dataset with 4 numerical attributes and are required to thoroughly assess correlations among the attributes with visualisation. There are several visualisation techniques we can use to assess correlations. (1) What aspects of correlation could we assess? [1 mark] (2) From the visualisation techniques we introduced in this subject, which two are most suitable for assessing correlations in this dataset? [1 mark] (3) Nominate a best choice of visualisation from the two most suitable ones and justify how you have come to the choice compared to the other option. [2 marks] |
|||||||||
Edit View Insert Format Tools Table 12pt Paragraph |
Question 5 |
5 pts |
|||||||||
It is good practice to use n-fold cross-validation during datamining tasks as this produces a generally less biased comparison between algorithms. Another data wrangler is working on a data set of 1,200 Australian hospital patients that need to be classified into 3 classes as follows: |
||||||||||
|
Has COVID-19 Does not have COVID-19 but has a physical injury Has a condition other than COVID-19 or a physical injury |
|||||||||
The classification will be based on 8 features using supervised approaches. The data wrangler decides to use 24-fold cross-validation as 3 x 8 = 24. Briefly explain what n-fold cross validation is. [1 mark] Next, discuss what do you think about his choice of n = 24, will this achieve a good outcome? [2 marks]
|
2022-07-12