关键词 > MATH3772/5772

MATH3772/5772 Multivariate Analysis Practical 2021/22

发布时间：2021-12-01

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

MATH3772/5772

Multivariate Analysis Practical 2021/22

A data ﬁle athlrecs.txt containing country record times for men’s track events for 55 countries immediately prior to the 1984 Olympic Games can be found on the module webpage. It contains the following variables:

Country:	Name of country
m100:	Record time for 100m race in seconds
m200:	Record time for 200m race in seconds
m400:	Record time for 400m race in seconds
m800:	Record time for 800m race in minutes
m1500:	Record time for 1500m race in minutes
km5:	Record time for 5000m race in minutes
km10:	Record time for 10000m race in minutes
mara:	Record time for the Marathon (approx. 26 miles) in minutes
status:	1 for developed countries; 3 for third world countries

For the purposes of this practical, just concentrate on the 4 races m100, m200, m400, m800. Simultaneous conﬁdence intervals may be helpful for parts 2 and 3.

1. Examine whether it is reasonable to assume that the data can be described as multivariate normal.

2. For the whole set of 55 countries, investigate the hypothesis µ800 = 2µ400 = 4µ200 = 8µ100 . This hypothesis says that the speed of the record runs over that range of distances is constant (after ﬁrst ensuring the units of time are the same for all races). To carry out this test you may ﬁnd it convenient to make a linear transformation of the data. Let X denote a 55 X 4 data matrix for the races of interest. Find a matrix A(3 X 4) such that if the above hypothesis holds, then the mean of the data matrix Y = XAT is 0.

3. The countries have been split (somewhat arbitrarily) into developed countries (status = 1) and third world countries (status = 3). Next investigate the hypothesis that the 4-dimensional mean vector for race times is the same for the two groups of countries.

4. [Level 5 only.] Carry out a kmeans clustering of the data into k = 2 clusters. Compare the resulting clusters to the partitioning of the data by the status variable.

Some useful commands in R

ath=read.table("http://www1.maths.leeds.ac.uk/~john/3772/athlrecs.txt",header=T) attach(ath)

x=cbind(m100, m200, m400, m800) # create a data matrix for the 4 races

# using all 55 countries x1=x[status==1,] # define a 23 x 4 submatrix of developed countries x2=x[status==3,] # define a 32 x 4 submatrix of third world countries