关键词 > MATH3772/5772

MATH3772/5772 Multivariate Analysis Practical 2021/22

发布时间:2021-12-01

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit





MATH3772/5772

Multivariate Analysis Practical 2021/22


A data file athlrecs.txt containing country record times for men’s track events for 55 countries immediately prior to the 1984 Olympic Games can be found on the module webpage. It contains the following variables:


Country:

Name of country

m100:

Record time for 100m race in seconds

m200:

Record time for 200m race in seconds

m400:

Record time for 400m race in seconds

m800:

Record time for 800m race in minutes

m1500:

Record time for 1500m race in minutes

km5:

Record time for 5000m race in minutes

km10:

Record time for 10000m race in minutes

mara:

Record time for the Marathon (approx. 26 miles) in minutes

status:

1 for developed countries; 3 for third world countries

For the purposes of this practical, just concentrate on the 4 races m100, m200, m400, m800. Simultaneous confidence intervals may be helpful for parts 2 and 3.

1.  Examine whether it is reasonable to assume that the data can be described as multivariate normal.

2.  For the whole set of 55 countries, investigate the hypothesis µ800  = 2µ400  = 4µ200  = 8µ100 . This hypothesis says that the speed of the record runs over that range of distances is constant (after first ensuring the units of time are the same for all races).  To carry out this test you may find it convenient to make a linear transformation of the data. Let X denote a 55 X 4 data matrix for the races of interest. Find a matrix A(3 X 4) such that if the above hypothesis holds, then the mean of the data matrix Y = XAT  is 0.

3.  The countries have been split (somewhat arbitrarily) into developed countries (status = 1) and third world countries (status = 3). Next investigate the hypothesis that the 4-dimensional mean vector for race times is the same for the two groups of countries.

4.  [Level 5 only.]  Carry out a kmeans clustering of the data into k = 2 clusters.  Compare the resulting clusters to the partitioning of the data by the status variable.


Some useful commands in R


ath=read.table("http://www1.maths.leeds.ac.uk/~john/3772/athlrecs.txt",header=T) attach(ath)

x=cbind(m100, m200, m400, m800)  #  create  a  data matrix  for  the  4  races

#  using  all  55  countries                   x1=x[status==1,]  #  define  a  23  x  4  submatrix  of  developed  countries    x2=x[status==3,]  #  define  a  32  x  4  submatrix  of  third  world  countries