关键词 > STAT318/STAT462

STAT318-20S2 (C) / STAT462-20S2 (C) Data Mining

发布时间:2022-10-21

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Mathematics and Statistics

EXAMINATION

End-of-year Examinations, 2020

STAT318-20S2 (C) / STAT462-20S2 (C) Data Mining

1.   (a) Suppose that we have the following 100 market basket transactions.

Transaction

Frequency

{apple}

{apple, carrot}

{apple, banana, carrot}           {apple, banana, carrot, orange} {banana, orange}

{apple, banana, orange}

{carrot, orange}

{banana, carrot, orange}

9

10

27

21

3

11

5

14

 

 

For example, there are 10 transactions of the form {apple, carrot}

i.  Compute the support of {orange}, {banana, carrot}, and {banana, carrot, orange}.

ii.  Compute the confidence of the association rules:           {banana, carrot} → {orange};  and

{orange} → {banana, carrot}.

Is confidence a symmetric measure? Justify your answer.

iii.  Find the 3-itemset(s) with the largest support.

iv.  If minsup = 0.2,  is {banana, carrot, orange} a maximal frequent itemset?

Justify your answer.

v.  Lift is defined as

s(X n Y)

Lift(X Y) =

where s( ) denotes support. What does it mean if Lift(X → Y) < 1.

(b) This question examines linear discriminant analysis (LDA) and quadratic discrimi-

nant analysis (QDA) for a 2-class classification problem.

i.  Under what conditions are Bayes classifier and LDA equivalent classifiers?

ii.  Explain the difference between LDA and QDA.

iii.  In general, as the sample size increases, do we expect the test prediction ac- curacy of QDA relative to LDA to improve, decline, or be unchanged? Why?

2.   (a)  Describe two potential advantages of regression trees over other statistical learning methods.

(b) This question uses the CART partition given below. The points in each box denote

the training data and the numbers next to them are their response values.

3

1         14

2   . 1    .                        14.

.16       .10

X2        1                                                                   10

.

2

.

0                5                          

- 1       .3   .        .4                                .12

2      3       4       5       6       7       8       9      10

X 1

i. Sketch the decision tree corresponding to the CART partition.

ii.  Predict the  response values for the following observations  using your tree, where x = (x1 , x2 ):

a = (5, 1);   b = (3.5, 2);    and   c = (11, −2.)

iii. What is the training MSE for your tree?

(c) The predictive performance of a single regression tree can be substantially improved by aggregating many decision trees.

i.  Briefly explain the method of bagging regression trees.

ii.  Explain the difference between bagging and random forest.

iii.  Briefly explain two differences between boosted regression trees and random forest.

3.   (a)  Using one or two sentences, explain the main difference between regression and classification problems.

(b) The expected test MSE, for a given x0 , can be decomposed into the sum of three

fundamental quantities:

E[y0 − fˆ(x0 )]2  = V (fˆ(x0 )) + [Bias(fˆ(x0 ))]2 + V (∈). Briefly explain each of these three quantities.

(c)  Briefly explain what is meant by over-fitting and under-fitting the training data.

(d)  Provide a sketch typical of training error, testing error, and the irreducible error, on a single plot, against the flexibility of a statistical learning method. The x-axis should represent the exibility and the y-axis should represent the error. Make sure the plot is clearly labelled.  Explain why each of the three curves has the shape displayed in your plot.

(e)  Indicate on your plot in part (d) where we would generally expect to find a regression

tree with ve splits,  a  random forest,  and  k-nearest  neighbour  regression with k = 3. Explain why you have placed these methods where you have.

(f) Would we generally expect the training MSE of an un-pruned regression tree to be better or worse than a pruned regression tree? Why?

4.   (a) Suppose that we have ve points, x1 , . . . , x5 , with the following distance matrix:

 x1     x2     x3     x4     x5

x1

x2

x3

x4

x5

For example, the distance between x1  and x2  is 6 and the distance between x3  and x5  is 8.

i.  Briefly explain the single linkage and complete linkage hierarchical clustering algorithms.

ii.  Using the distance matrix above, sketch the dendrogram that results from hierarchically clustering these  points  using complete  linkage.   Clearly  label your dendrogram and include all merging distances.

iii. Suppose we want a clustering with two clusters.  Which points are in each cluster for complete linkage?

iv.  Briefly explain one strength and one weakness of hierarchical clustering.

(b)  Consider applying the k-means clustering algorithm to two-dimensional points using

Euclidean distance with k = 2.

i.  Briefly explain the k-means clustering algorithm.

ii. After two passes of the main loop of the k-means algorithm, the following cluster assignment was obtained:

Cluster 1 = {(1, 2), (2, 3), (5, 2), (2, 1)}

Cluster 2 = {(6, 2), (5, 3), (7, 4)}.

Has k-means converged to a stable clustering? Justify your answer.

iii. Would we generally expect to get the same clustering if we run the k-means algorithm several times? Explain.

iv.  Briefly explain one strength and one weakness of the k-means clustering algo- rithm.