STATS 5099 Statistics – Data Mining and Machine Learning 2022
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Statistics – Data Mining and Machine Learning
STATS 5099
2022
1. Please complete the Moodle quiz on the Exam Moodle Page. Justification of answers is not required for the Moodle quiz. [20 MARKS]
2. Please complete the Moodle quiz on the Exam Moodle Page. Justification of answers is not required for the Moodle quiz. [20 MARKS]
3. Glass is a material which figures prominently in the investigation of crimes such as burglary. To study the type of glasses, the UK Forensic Science Service collected 214 glass samples and carried out a chemical analysis to identify 9 chemical properties (i.e.
9 variables) for each sample:
• RI (X1 ): refractive index
• Na (X2 ): Sodium (unit of measurement: weight percent in corresponding oxide; same applies to variables X3-X9 )
• Mg (X3 ): Magnesium
• AI (X4 ): Aluminium
• Si (X5 ): Silicon
• K (X6 ): Potassium
• Ca (X7 ): Calcium
• Ba (X8 ): Barium
• Fe (X9 ): Iron
A sample observation looks as follows:
RI |
Na |
Mg |
Al |
Si |
K |
Ca |
Ba |
Fe |
Class |
1.5 |
14 |
4.5 |
1.1 |
72 |
0.06 |
8.8 |
0 |
0 |
1 |
An important task in forensic science is to identify if the glass sample is a window glass (class 1) or non-window glass (class −1). For this purpose, a support vector machine (SVM) with a radial basis function (RBF) kernel is applied.
(a) The following code (on next page) is implemented to tune two hyperparameters
of SVM with a RBF kernel, namely gamma and cost. Explain how they affect the fitting of an SVM. Based on the summary result, which value will you select as the optimal set of hyperparameters?
> C .val <- c(0 .01,0 .1,1,10)
> Gamma <- c(0 .01,0 .1,1,10)
> glass.cv <- tune .svm(Class~ . , data=Glass .train, type="C-classification", kernel="radial", gamma=Gamma, cost=C .val, tunecontrol=tune .control(cross=10))
> summary(glass.cv)
- Detailed performance results: gamma cost error dispersion
1 0.01 0.01 0.23750 0.10944938
2 0.10 0.01 0.23750 0.10944938
3 1.00 0.01 0.23750 0.10944938
4 10.00 0.01 0.23750 0.10944938
5 0.01 0.10 0.23750 0.10944938
6 0.10 0.10 0.11875 0.06215181
7 1.00 0.10 0.23750 0.10944938
8 10.00 0.10 0.23750 0.10944938
9 0.01 1.00 0.06875 0.04611655
10 0.10 1.00 0.06250 0.05103104
11 1.00 1.00 0.10625 0.04218428
12 10.00 1.00 0.23750 0.10944938
13 0.01 10.00 0.06875 0.06878156
14 0.10 10.00 0.07500 0.05743354
15 1.00 10.00 0.10625 0.04218428
16 10.00 10.00 0.22500 0.09860133
[3 MARKS]
(b) The previous tune .svm command includes an argument of tune .control(cross=10).
List one advantage and disadvantage when changing the value from 10 to the num- ber of training samples.
[2 MARKS]
(c) After training SVM with the optimal hyperparameters, it is used to make predic- tions on training and test data. Comment on the training and test performance of SVM based on the following R output.
> table(Glass .train$Class, glass .pred) #training performance
glass .pred
-1 1
-1 36 2
1 1 121
> table(Glass .test$Class, glass .pred) #test performance
glass .pred
-1 1
-1 10 3
1 2 39
[3 MARKS]
(d) The polynomial kernel is another widely used kernel function in SVM. Write down a piece of R code to tune the hyperparameters of SVM with a polynomial kernel based on the leave-one-out cross-validation accuracy. In other words, you should perform leave-one-out cross-validation on the training data (Glass .train), use accuracy as the evaluation criterion, and select the optimal degree from the range of {1, 2, 3, 4} and cost from the range of {0.01, 0.1, 1, 10, 100}. Note that you CANNOT use any built-in function, such as svm .tune() and tune(). You can either handwrite the code or append the typed code to your script. [6 MARKS]
(e) A scientist suspected that there are sub-classes within the window glass but was
unsure about the class labels. To identify potential groups, he decided to apply the K-means clustering algorithm. List two scenarios which may limit the effectiveness
of K-means. [2 MARKS]
(f) Figure 1 plots the average silhouette width against different number of clusters.
Based on the figure, suggest the optimal number of clusters and comment on the
clustering performance at this optimal value. [2 MARKS]
Figure 1: Plot of average silhouette width against different number of clusters. (g) Suggest another way of determining the optimal number of clusters. [2 MARKS]
2022-07-28