闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

COMP90049 Introduction to Machine Learning

Final Exam

Semester 1, 2022

Total marks: 120

Section A: Short answer Questions [27 marks]

Answer each of the questions in this section as brieﬂy as possible. Expect to answer each question in 1-3 lines, with longer responses expected for the questions with higher marks.

Question 1: [27 marks]

(a) Both Naive Bayes and logistic regression are probabilistic machine learning models, and have a

model of the underlying data. Explain the diﬀerence between the respective models in the context of the task of spam classiﬁcation. [4 marks]

(b) Consider the mean squared error (MSE) as an alternative to the cross-entropy loss (CE),

LMSE : (yi _ yˆi )2

i=1

LCE : _ yi log yˆi

Here, yi refers to the true label of the ith instance, and yˆi to its predicted label. Would you choose MSE or cross-engropy as a loss function when optimizing a supervised 5-way classiﬁcation model? Justify your choice. [5 marks]

(c) In the following table, ﬁll in each cell with one of : _increases, decreases, same{ to indicate the most typical eﬀect (column) of diﬀerent strategies (rows). For example, cell (a) should indicate the eﬀect that increasing the training data set size will have on model bias. [4.5 marks]

	model bias	model variance	model generalization
increasing the training data set size	(a)	(b)	(c)
increasing the model complexity	(d)	(e)	(f)

(h)

(i)

(d) Anna and Bob have developed a classiﬁcation model. Anna wants to test it using 20-fold cross- validation, but Bob would rather use 3-fold cross-validation. Explain one valid reason in favor of Anne’s strategy and one reason in favor of Bob’s strategy. [2 marks]

(e) (i) Explain in your own words the problem of constrained optimization. (ii) Explain in your own

words how this concept relates to evaluating classiﬁers for fairness in the context of a concrete example. (N.B. no formula or calculations are necessary, providing the intuitions is suﬃcient.) [3 marks]

(f) Feng wants to build a machine learing model to predict how attractive a country is as a travel

destination. For each country he has a large list of features, including ‘average temperature’, ‘population (in million)’, ‘size (km2 )’, ‘location (longitude, latitude)’, ‘GDP’, ‘length of the national anthem (in seconds)’, and many more. In the context of Feng’s machine learning task, (i) explain both feature selection and feature normalizaton and (ii) explain one diﬀerence between the two. [3.5 marks]

(g) Consider a labelled dataset of 20 buildings, where for each building you want to predict a binary

label: keep or tear down. For each building there are three categorical features: (1) its age (<20 years; between 20 and 50 years; >50 years); (2) its insulation quality (high, medium, low); (3) its location (in nature, near nature, downtown). (i) Describe how to build a Random Forest classiﬁer, referring to the size and properties of the data set above (e.g., number of instances and features). (ii) Is a Random Forest a suitable model for the given data? Justify your answer by referring back to properties of Random Forests in (i). [5 marks]

Section B: Method Questions [68 marks]

In this section you are asked to demonstrate your conceptual understanding of the methods that we have studied in this subject.

Question 2: Probability [6 marks]

The headmaster of the arts department of a university is concerned about her students’ health. The arts students share a cafeteria with the science students (but no others). On a typical day, 50% of arts students and 10% of science students want a burger for lunch. 30% of the cafeteria customers are science students. What is the percentage of arts students that eat burgers on a day? [6 marks]

Question 3: Fair classiﬁcation and mutual information [15 marks]

The local high school is auditing its admission procedure for fairness. Parents have voiced a concern that admissions are impacted by the gender of the student: female students have a higher chance of being admitted than male students. Each application includes a whole range of information, including the student’s height, gender, grades, postcode of home address, primary school they graduated from, and hobbies. In 2019, the admission statistics (by gender only) were as follows:

	admitted	not admitted
male	360	240
female	260	120

(a) You want to train a classiﬁer that fairly predicts student admission, without discriminating and

group. Describe the kind(s) of bias that you need to take care of in the context of this scenario. [2 marks]

(b) (i) In the context of the scenario above, explain the approach of fairness through unawareness. (ii) Is

fairness through unawareness a valid approach to address the above problem? Justify your answer. [3 marks]

(c) Data re-weighting is one strategy for improving the fairness of a ML model. Explain in your own words the intuition behind data re-weighting. Refer to the problem given above, and draw connections to the concepts of statistical association measures typically used for feature selections. [3 marks]

(d) Apply data-reweighing to the data set in the table above. Explain the resulting weights in your own words. (N.B. Show your mathematical working. Use precision of two or three decimal points .) [7 marks]

Question 4: Decision Trees and Ensembling [13 marks]

Consider the following data set of seven train instances (1–7) and one test instance (8) for a binary classiﬁcation problem of predicting whether a student was happy with their exam grade. Each student is characterized by three features: whether they cheated (‘cheated’), whether they slept the night before the exam (‘slept’), and the number of hours they studied for the exam.

	cheated	slept	#hours studied	happy with exam grade
1	F	T	10	yes
2	F	T	2	yes
3	T	F	5	no
4	F	T	7	no
5	F	F	2	no
6	T	T	10	yes
7	F	F	7	yes
8	T	T	3 no

(a) The feature ‘#hours studied’ is numeric, however, numeric features need some extra attention in

order to be used in decision trees. We want to compare the feature when represented in two diﬀerent ways: (i) represent its values as 4 discrete values (2, 5, 7, and 10); (ii) treat the values as numerical, and discretize them into two equal frequency bins. For both representations, compute the Information Gain compared to the root node entropy H(R) . Summarize your conclusions in terms of the utility of these two discretization methods 1-2 sentences. (N.B. Show your mathematical working. Use precision of two or three decimal points and logarithm of base 2.) [9 marks]

(b) Classify the test instance (8) with each of the decision stumps you built in part (a). Justify your

approach. [4 marks]

Question 5: Unsupervised Learning and Anomaly Detection [15 marks]

Consider the following data set of eight instances (labelled A...H), each characterized through two features (x1 and x2), and clusters are indicated as black circles. The points are shown visually on the left and coordinates are given for your convenience in the table on the right

Point

(x1, x2)

Point

(x1, x2)

(1, 3)

(2, 4)

(2, 1)

(5, 5)

(7, 8)

(5, 9)

(8, 8)

(9, 7)

(a) (i) Perform Agglomerative Clustering on the given data, using (1) one step of single link, and (2)

one step of complete link clustering (both (1) and (2) should take the clusters in the plot above as their starting point). Use Manhattan distance. (ii) Which method do you ﬁnd more reliable, and why? (N.B. Either show your mathematical working, or clearly describe how you arrived at your comclusion in a few sentences . If you show your working, use a precision of two or three decimal points .) [10 marks]

(b) You realize that points D and F are far away from the remaining points, and you want to test the

hypothesis that one or both points are outliers. [Question continues on next page .]

(i) Compute the outlier score of F and D wrt. the inverse of the density. Use Manhattan Distance, and the k=2 nearest neighbors inside any clusters (i.e., consider only points A, B, C, E, G, H as nearest neighbors). (ii) What do you ﬁnd? Is your result reliable? Discuss one strategy to improve it. (N.B. Show your mathematical working. Use precision of two or three decimal points .) [5 marks]

Question 6: Bias and Variance [19 marks]

This question explores the concepts of bias, variance and their trade-oﬀ. First, let us deﬁne some notation for relevant concepts:

c1. x = _x1 , . . . , xn{ refers to the observed data points.

c2. f (x) refers to the true function which generated the data.

c3. g(x) refers to a single estimate of f (x) by the model.

c4. E[g(x)] refers to the average estimate of from many modelling runs.

c5. σ(x) refers to the noise in the data.

(a) In your own words, describe the intuition behind the bias-variance decomposition using the notation

given in c1.–c5. above. (N.B writing only the formula is not enough, you should comment on the individual factors and their signiﬁcance) [5 marks]

(b) The following three plots show the result of ﬁtting many functions to a given data set.

(i) Label the x- and y-axes of the plots. (ii) Each plot reﬂects the concepts deﬁned in c1.–c5. above. For each plot, indicate where each concept is depicted. In addition, indicate how the plots reﬂect the bias and variance of the underlying model. (N.B. In total you should have 7 labels explaining diﬀerent elements/characteristic of the plot. You may either annotate the plots, or answer the question as written text.) [8 marks]

(c) For each of the three ﬁgures, indicate whether the (i) model has high, low bias, (ii) high or low variance and (iii) whether the model is underﬁtting, overﬁtting, or appropriately ﬁt. Provide reasons for the model behavior referring to properties of the model and/or data. [6 marks]

Section C: Design and Application Questions [25 marks]

In this section you are asked to demonstrate that you have gained a high-level understanding of the methods and algorithms covered in this subject, and can apply that understanding. Expect your an- swer to each question to be from one third of a page to one full page in length. These questions will require signiﬁcantly more thought than those in Sections A–B, and should be attempted only after having completed the earlier sections.

Question 7: Identifying Bots on social media platforms [25 marks]

Professor Bird is a computational linguist who wants to develop a machine learning model that can de- tect messages on social media platforms that were generated by bots rather than human users. She has collected a large data base of messages, and would like to automatically classify each message into one of two types: ‘bot’ or ‘human’ .

Professor Bird has a data set of 1,000,000 messages, 5,000 labelled and 995,000 unlabeled. For each message she has available (a) the content mapped to a 56-dimensional embedding; (2) the name of the author; (3) the author’s number of followers on the platform; (4) the average # of posts of the author per day. For some authors she also has the following features: (5) author location; (6) author nationality. To summarize the data set

2022-10-20

Java

物理(Physical)

LINUX

C++

Python

Processing

sas

ios

maths

maple

C语言