Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

MA6529/19

STATISTICAL LEARNING

2019

SECTION A

These  questions  will  each  be  marked  out  of 10.   Candidates may  attempt  all  SIX  questions  but  are  advised  that  they cannot obtain  a total of more  than  FIFTY MARKS on this section.

1.    (a)  Briefly describe the purpose of Canonical Correlation Analysis.                        [ 4 marks ]

(b)  A study was conducted to understand the effect of various factors (Gender, Age, Educa- tion, Income) on depression and physical health. The variables were

。  Depression score – a higher score represents a higher likelihood of depression. 。  Health score – a lower score represents a better level of health.

  Sex – male coded as 0 and female coded as 1.

。  Age – in years.

。  Education – a higher level of final qualification had a higher score. 。  Income – in thousands of dollars.

The variables were grouped as

X  1 = (Depression score, Health score)

and

X  2 = (Sex, Age, Education, Income).

The first canonical correlation vectors were

a1 = (-0.49, 0.982)

and

b1 = (0.025, 0.871, -0.383, 0.082).

Interpret these results.                                                                                                  [ 6 marks ]

 

2.   The correlation matrix below was calculated using the results of competitors in a decathlon event. The sample correlation matrix for: Shot Putt (X1), Javelin (X2) and Long Jump (X3)

was

Rα =  . 0.87(1)  0.49

1

0.81

/

(a)  Quoting  without  proof the  relevant  formulae,  find  the  partial  correlation  coefficient between X1  and X2  given X3, and comment on the result.                                       [ 5 marks ]

(b)  The multiple correlation coefficient between X3  and the other two variables (X1, X2 ) is 0.92. Explain how this can be calculated from Rα  and comment on the result.       [ 5 marks ]


3.   Let X1, X2 , . . . , Xn  denote a random sample of size n from a p-dimensional multivariate normal distribution with expectation vector µ and covariance matrix Σ. Derive the Likelihood Ratio Test statistic for testing the null hypothesis H0  : µ = µ0  against the alternative H1  : µ  µ0 , assuming that Σ is known.                                                                            [ 10 marks ]

4.   Consider the graph:

 

In the above graph each vertex represents a random variable. The edges describe the depen- dence structure among them. Answer the following questions:

(a)  Is the graph complete? Motivate your answer.                                                    [ 2 marks ]

(b)  Provide the definition of path of a graph. Are (X, Y, R) and (R, S, T) paths?

[ 3 marks ]

(c)   List the set of maximal cliques in the graph. State the representation of the probability

distribution of a graph in terms of maximal cliques and potentials.

5.   Answer the following questions about mixture models:  (a)  Provide the definition of Gaussian mixture models.

[ 5 marks ]

[ 2 marks ]

(b)  Provide an estimate of the probability that observation i belongs to component m.

[ 2 marks ]

(c)   Explain  how  to  perform  the  maximum  likelihood  estimation  in  a  mixture  with  two components.                                                                                                                  [ 6 marks ]

6.    (a)  Compare and contrast metric and non-metric methods of multidimensional scaling.

[ 6 marks ]

(b)  Explain what is the horseshoe effect.

[ 4 marks ]


SECTION B

These  questions  will  each  be  marked  out  of 25.   Candidates may not attempt more than TWO of the THREE questions.

7.   Feedback was taken from 1428 students who were taught by a single lecturer. The University is interested in identifying patterns in the responses. The students provided a score for each statement on the scale 1 to 5.  One being the lowest score and five being the highest score. The statements were:

A  Lecturer is well prepared

B  Lecturer has scholarly grasp”

C  Lecturer is confident”

D  “Lecturer focuses on examples”

E  “Lecturer uses clear examples”

F  “Lecturer is sensitive to students”

G  Lecturer allows time for questions

H  “Lecturer is accessible to students outside class”

I  Lecturer is aware of students understanding

J  I am satisfied with my performance”

K  Compared to other lecturers this one is...”  (The student needed to give a score)     L  “Compared to other modules this module is ...” (The student needed to give a score)

Output associated with a principal component analysis of the data is shown in Tables 1, 2 and 3 and should be used to answer the following questions.

(a)  Why is principal component analysis appropriate for these data?                     [ 3 marks ]

(b)  What does the correlation matrix reveal about the twelve variables of interest?  Which variables appear to be highly correlated?                                                                    [ 3 marks ]

(c)   Draw a scree plot for the principal component analysis of these data. Discuss how many components you would consider retaining for your analysis.                                      [ 8 marks ]

(d)  Table 3 provides the loadings of the first two components. Interpret these results.

[ 8 marks ]

(e)   The principal component analysis was calculated using the sample covariance matrix. Why is principal component analysis sometimes performed on the sample correlation matrix? In this example, would you consider using the sample correlation matrix? Justify your answer. [ 3 marks ]

Table 1: Correlation matrix

A        B        C        D        E        F        G       H        I         J        K        L  A      1      0.67    0.61    0.56    0.58    0.41    0.29    0.31    0.48    0.34    0.57    0.46 B    0.67      1      0.65    0.50    0.56    0.44    0.32    0.32    0.45    0.34    0.57    0.46 C    0.61    0.65      1      0.51    0.59    0.46    0.36    0.36    0.51    0.38    0.59    0.45 D    0.56    0.50    0.51      1      0.58    0.40    0.33    0.31    0.44    0.36    0.46    0.43 E    0.58    0.56    0.59    0.58      1      0.55    0.44    0.42    0.59    0.46    0.61    0.53 F    0.41    0.44    0.46    0.40    0.55      1      0.63    0.52    0.55    0.54    0.57    0.47 G    0.29    0.32    0.36    0.33    0.44    0.63      1      0.45    0.50    0.49    0.44    0.37 H    0.31    0.32    0.36    0.31    0.42    0.52    0.45      1      0.43    0.39    0.41    0.37

I     0.48    0.45    0.51    0.44    0.59    0.55    0.50    0.43      1      0.50    0.60    0.50 J    0.34    0.34    0.38    0.36    0.46    0.54    0.49    0.39    0.50      1      0.50    0.45 K    0.57    0.57    0.59    0.46    0.61    0.57    0.44    0.41    0.60    0.50      1      0.71 L    0.46    0.46    0.45    0.43    0.53    0.47    0.37    0.37    0.50    0.45    0.71      1

 

Table 2: Eigenvalues

Component

1         2         3        4         5         6         7         8

9        10       11        12

Total              % of variance Cumulative %

1.23

10.2

62.3

0.37

3.1

92.5

 

Table 3: Loadings for the first two principal components - PC1 and PC2

 

A        B

C        D        E        F

G       H        I         J

K        L

PC1 PC2

0.84

0.15

0.76

0.27

0.24 0.71

0.64

0.52

 

8.   In a study aimed at predicting success in postgraduate statistics students, two variables (x1 , the GRE score, a measure of quantitative ability (in hundreds), and x2 , the number of hours of undergraduate statistics modules taken (in tens)) were considered. Random samples of 25 successful students and 25 unsuccessful students were taken.  The results were summarized as follows. The sample mean vectors for the successful group and the unsuccessful group are respectively x1  =  (5.49, 5.74)7  and x2  =  (4.99, 2.88)7 .  The sample covariance matrices are

respectively

S1 =  \ ,    S2 =  \

(a)  Use the available information to provide a linear discriminant function to distinguish between these two groups.  You may assume that the two types of error are considered to be equally important.  What can you conclude about the effect of x1  and x2  on a student’s success?                                                                                                                        [ 12 marks ]

(b)  Stating any assumptions you make, estimate the probability of misclassification.

[ 5 marks ]

(c)   Two individuals, A and B, are considered, where A has x1 = 5.25 and x2 = 4.4 and B has x1  = 5.20 and x2  = 4.6.  Use your results to predict whether individual A will be successful or unsuccessful.  If you were also asked to predict the outcome for B, would you be more confident about your prediction in comparison to A, or less confident?                    [ 8 marks ]

9.    (a)  The table below contains the Euclidean distances among a group of 7 people based on their standardized Wechsler Adult Scales scores.

Subject       1         2         3        4         5         6         7

1         0.0    2.35    2.21    1.74    3.39    3.16    1.15

2                    0.0    2.61    3.50    2.01    1.93    2.84

3                              0.0    2.88    3.12    2.82    2.24

4                                         0.0    4.93    4.64    1.60

5                                                    0.0    0.55    4.00

6                                                               0.0    3.84

7                                                                         0.0

Using it as the dissimilarity matrix, demonstrate the single-link cluster analysis procedure by explaining how to obtain the matrix of dissimilarities between clusters in a solution with 5 clusters.                                                                                                                        [ 10 marks ]

[Note:  There is no need to write down the full matrix, but you should write down rows or columns of the revised matrices corresponding to the new clusters.]

(b)  Five subjects were each given three psychological tests.  The scores for each subject on each test were recorded and the Euclidean distances between each pair of the subjects were calculated as follows:

Subject

A     B      C       D       E

A    0    4.2    5.9     1.2     6.1

B            0     7.6     7.0     2.6

C                     0     10.3    5.4

D                               0      7.8

E                                         0

(i)  Cluster the five subjects, using complete-link clustering. Sketch the dendrogram and discuss the number of clusters in the data.                                                  [ 10 marks ]

(ii)  Briefly describe a model-based clustering approach to clustering A, B, C, D, and E. [ 5 marks ]