Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

SDSC5001 Statistical Machine Learning I

Assignment #3

Deadline: November 28, Tuesday @ 10:00 PM

1. Consider the Gini index, classification error, and entropy in a simple classification setting with two classes (0 and 1). Create a single plot that displays each of these quantities as a function of

̂(p)t1, the proportion of training observations in node t that are from class 1. The x axis should display

̂(p)t1, ranging from 0 to 1, and they axis should display the value of the Gini index, classification

error, and entropy. You can make the plot by hand or software.

2. Suppose we produce 10 bootstrapped samples from a data set containing red and green classes. We then apply a classification tree to each bootstrapped sample and, for a specific value of X, produce 10 estimates of P(Class is Red | X):

0.1, 0.15, 0.2, 0.2, 0.55, 0.6, 0.6, 0.65, 0.7, 0.75

There are two common ways to combine these results together into a single class prediction. One

is the majority vote approach, and the other one is to classify based on the average probability.

(a) What is the final classification under the majority vote approach?

(b) What is the final classification under the average probability approach?

3. Answer the following questions using the Carseats data set.

(a) Split the data set into a training set and a test set.

(b) Fit a regression tree to the training set. Plot the tree, and interpret the results. What test error rate do you obtain?

(c) Perform tree pruning and use cross validation to determine the optimal level of tree complexity. Does pruning the tree improve the test error rate?

(d) Use the bagging approach to analyze this dataset. What test error rate do you obtain? Find which variables are most important.

(e) Use random forests to analyze this dataset. What test error rate do you obtain? Find which variables are most important. Describe the effect of m, the number of variables considered at each split, on the error rate obtained.

4. The toy data set below contains n  = 7 observations in p = 2 dimensions. For each observation, there is an associated class label.

(a) Sketch the observations.

(b) Sketch the optimal separating hyperplane.

(c) Describe the classification rule. It should be something like “Classify to Red if the point falls to….. and classify to Blue otherwise”.

(d) On your sketch, indicate the margin for the optimal hyperplane.

(e) Indicate the support vectors.

(f)  Argue  that  a  slight  movement  of the  seventh  observation  would  not  affect  the  optimal hyperplane.

5. Suppose that for a particular data set, we perform agglomerative (hierarchical) clustering using single linkage and complete linkage. We obtain two dendrograms.

(a) At a certain point on the single linkage dendrogram, the clusters  {1, 2, 3} and  {4, 5} fuse. On the complete linkage dendrogram, the clusters {1, 2, 3} and {4, 5} also fuse at a certain point. Which fusion will occur higher on the tree, or will they fuse at the same height, or is there not enough information to tell?

(b) At a certain point on the single linkage dendrogram, the clusters  {5} and  {6} fuse. On the complete linkage dendrogram, the clusters {5} and {6} also fuse at a certain point. Which fusion will occur higher on the tree, or will they fuse at the same height, or is there not enough information to tell?