Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

COMP3308/COMP3608 Artificial Intelligence

Week 10 Tutorial exercises

Support Vector Machines. Ensembles of Classifiers.

This week we have a smaller number of tutorial exercises. We will use the remaining time for questions about Assignment 2.

Regarding Assignment 2: Please do not underestimate the report! It is worth 12/24 marks = 50% of your mark for this assignment. Sometimes students spend too much time on the code and then there is not enough time left to write a good quality report.

Exercise 1. (Homework)

This is a set of 5 multiple choice questions to be answered online on Canvas. To complete the homework, please go to Canvas -> Quizzes->w10-homework-submission. Remember to press the Submit” button at the end. Only 1 attempt is allowed.

Here are the questions:

Q1. The problem of finding a decision boundary in SVM can be formulated as an optimisation problem

using Lagrange multipliers. What is the goal?

a)   To maximize the margin of the decision boundary.

b)  To minimize the margin of the decision boundary.

Q2. After SVM learning, each Lagrange multiplier lambda_i takes either zero or non-zero value. Which one is correct:

a) A non-zero lambda_i indicates that example i is a support vector.

b) A zero lambda_i indicates that example i is a support vector.

c) A non-zero lambda_i indicates that the learning has not yet converged to a global minimum.

d) A zero lambda_i indicates that the learning process has identified support for example i.

Q3. In linear SVM, during training we compute dot products between:

a) training vectors

b) training and testing vectors

c) support vectors

d) support vectors and Lagrange multiplayers

Q4. Bagging is only applicable to classification problems and cannot be applied to regression problems.

a)   True

b)  False

Q5. Boosting is guaranteed to improve the performance of the single classifier it uses.

a)   True

b)  False

Explanation:


Q1: The goal is to find the decision boundary (hyperplane) with maximum margin => the optimization problem is to maximize the margin.

Q2: This is the definition of a support vector.

Q3: During training, we compute dot products between pairs of training vectors; during testing between the testing example and the support vectors. Note that d) can be immediately ruled out as it is a product of a vector and coefficient; dot product is a product of 2 vectors.

Q4: Bagging can be applied to both classification and regression problems. In regression problems, the individual predictions will be averaged (see slide 14 of lecture10b).

Q5: There is no guarantee that an ensemble of classifiers (including Boosting) will produce better accuracy than the single classifier it uses. In practice, ensembles often perform better but there is no guarantee.

Exercise 2. Bagging, Boosting and Random Forest

a) What are the similarities and differences between Bagging and Boosting?

Answer: See the lecture slides (slide 31)

b) What are the 2 main ideas that are combined in Random Forest?

Answer: Bagging and random selection of features

Exercise 3 Boosting

Continue the example from slides 24-25. The current probabilities of the training examples to be used by AdaBoost are given below:

p(x1)

p(x2)

p(x3)

p(x4)

p(x5)

p(x6)

p(x7)

p(x8)

p(x9)

p(x10)

0.07

0.07

0.07

0.07

0.07

0.07

0.07

0.17

0.17

0.17

From these, a training set T2 has been created and using T2 a classifier C2 has been generated. Suppose that C2 misclassifies examples x2 and x9 . Show how the probabilities of all training examples are recalculated and normalized.

Solution:

1.   Initial probabilities:

p(x1)

p(x2)

p(x3)

p(x4)

p(x5)

p(x6)

p(x7)

p(x8)

p(x9)

p(x10)

0.07

0.07

0.07

0.07

0.07

0.07

0.07

0.17

0.17

0.17

2.   Examples 2 and 9 are misclassified => their errors e are 1, while the errors of the other 8 examples are 0

The weighed error of the classifier C2 will be: 2=0.07*1+0. 17*1=0.24

3.   The term for the probability modification will be: 2=2 /(1- 2)=0.24/0.76=0.32

4.   Calculating the new probabilities by modifying (reducing) only the probabilities of the correctly classified examples by: p(xj)=p(xj)* 2

The new probabilities are:

p(x1)

p(x2)

p(x3)

p(x4)

p(x5)

p(x6)

p(x7)

p(x8)

p(x9)

p(x10)

0.07*0.32

0.07

0.07*0.32

0.07*0.32

0.07*0.32

0.07*0.32

0.07*0.32

0. 17*0.32

0.17

0. 17*.32


p(x1)

p(x2)

p(x3)

p(x4)

p(x5)

p(x6)

p(x7)

p(x8)

p(x9)

p(x10)


0.022

0.07

0.022

0.022

0.022

0.022

0.022

0.054

0.17

0.054

5.  Normalization:

Sum of all probabilities: 0.022*6+0.054*2+0.07+0. 17=0. 132+0. 108+0.24=0.48

Normalized probabilities:

p(x1)

p(x2)

p(x3)

p(x4)

p(x5)

p(x6)

p(x7)

p(x8)

p(x9)

p(x10)

0.022/0.48

0.07/0.48

0.022/0.48

0.022/0.48

0.022/0.48

0.022/0.48

0.022/0.48

0.054/0.48

0. 17/0.48

0.054/0.48


p(x1)

p(x2)

p(x3)

p(x4)

p(x5)

p(x6)

p(x7)

p(x8)

p(x9)

p(x10)

0.045

0.146

0.045

0.045

0.045

0.045

0.045

0.113

0.354

0.113

We can see that the weights of the misclassified examples (x2 and x9) have increased while the weights of the correctly classified examples (the remaining 8 examples) have decreased. In the next iteration x2 and  x9 will have a higher chance to be selected for the new training set, i.e. the next classifier will focus on learning to classify these more difficult examples.

Exercises using Weka

Exercise 4. SVM in Weka Binary classification

1. Load the diabetes data (diabetes.arff) or another binary classification data. The two classes is the diabetes data are tested_positive” and “tested_negative” .

Choose 10-fold cross validation. Run the SVM (SMO located under functions”). This implementation of SVM uses John Platt's sequential minimal optimization algorithm to solve the optimization problem (i.e. to train SVM classifier) and shows the computed decision boundary.

Examine the output.

a) What type of SVM is used linear or non-linear? What type of kernel function is used? Where is the equation of the decision boundary?

Answer:

Linear SVM. This is evident from Linear Kernel: K(x,y) = ” . This is the default option in Weka.

The equation of&n