关键词 > STAT318/STAT462

STAT318 / STAT462 -19S2 (C) Data Mining

发布时间：2022-10-21

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Mathematics and Statistics

EXAMINATION

End-of-year Examinations, 2019

STAT318 / STAT462 -19S2 (C) Data Mining

1. (a) Suppose that we have the following 100 market basket transactions.

Transaction

Frequency

{apple}

{apple, carrot}

{apple, banana, carrot} {apple, banana, grape} {apple, banana, carrot, orange} {banana, grape}

{carrot, orange}

{apple, grape, orange}

For example, there are 10 transactions of the form {apple, carrot}

i. Compute the support of {orange}, {apple, banana}, and {apple, banana, orange}.

ii. Compute the conﬁdence of the association rules:

{apple, banana} → {orange}; and

{orange} → {apple, banana}.

Is conﬁdence a symmetric measure? Justify your answer.

iii. Find the 3-itemset(s) with the largest support.

iv. If minsup = 0.1, is {carrot, orange} a maximal frequent itemset? Justify your answer.

v. Lift is deﬁned as

s(X n Y)

Lift(X → Y) =

where s( ) denotes support. What does it mean if Lift(X → Y) = 1.

(b) This question examines linear discriminant analysis (LDA) and quadratic discrimi-

nant analysis (QDA) for a 3-class classiﬁcation problem.

i. Explain the diﬀerence between LDA and QDA.

ii. Brieﬂy describe Bayes classiﬁer and the Bayes error rate.

iii. Under what conditions does the testing error rate for QDA equal Bayes error rate?

2. (a) Describe two potential advantages of regression trees over other statistical learning methods.

(b) When growing a regression tree using CART, two types of splits are considered. Describe these splits and provide an example for each.

(c) A regression tree has three types of nodes: the root node, internal nodes and terminal nodes. Describe each node and explain how predictions are made using a regression tree.

(d) Large bushy regression trees tend to over-ﬁt the training data. Brieﬂy explain what is meant by over-ﬁtting and under-ﬁtting the training data using regression trees.

(e) The predictive performance of a single regression tree can be substantially improved

by aggregating many decision trees.

i. Brieﬂy explain the method of bagging regression trees.

ii. Explain the diﬀerence between bagging and random forest.

iii. Brieﬂy explain two diﬀerences between boosted regression trees and random forest.

3. (a) Using one or two sentences, explain the main diﬀerence between regression and classiﬁcation problems.

(b) The expected test MSE, for a given x0 , can be decomposed into the sum of three

fundamental quantities:

E[y0 − fˆ(x0 )]2 = V (fˆ(x0 )) + [Bias(fˆ(x0 ))]2 + V (∈). Brieﬂy explain each of these three quantities.

(c) Provide a sketch typical of training error, testing error, and the irreducible error, on a single plot, against the ﬂexibility of a statistical learning method. The x-axis should represent the ﬂexibility and the y-axis should represent the error. Make sure the plot is clearly labelled. Explain why each of the three curves has the shape displayed in your plot.

(d) Describe two situations where we would generally expect the testing MSE of an inﬂexible statistical learning method to be better than a ﬂexible method.

(e) Would we generally expect the training MSE of a ﬂexible statistical learning method to be better or worse than an inﬂexible method? Why?

4. (a) Using one or two sentences, explain the diﬀerence between supervised learning and unsupervised learning.

(b) Suppose that we have ﬁve points, x1 , . . . , x5 , with the following dissimilarity matrix:

x1 x2 x3 x4 x5

0.45

0.53

0.56

0.24

For example, the dissimilarity between x1 and x2 is 0.9 and the dissimilarity between x3 and x5 is 0.15.

i. Brieﬂy explain the agglomerative hierarchical clustering algorithm.

ii. Using the dissimilarity matrix above, sketch the dendrogram that results from hierarchically clustering these points using single linkage. Clearly label your dendrogram and include all merging dissimilarities.

iii. Suppose we want a clustering with two clusters. Which points are in each cluster for single linkage?

iv. Repeat parts ii. and iii. using complete linkage.

v. Describe one disadvantage of agglomerative hierarchical clustering.