闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STAT6108 Analysis of Hierarchical Data

Assignment, 2022–23

● This assignment is worth 100% of the overall mark for STAT6108.

● The deadline for submission is 16.00 on Thursday 11 May 2023.

● Standard University policies and procedures will be followed for late submission, extensions and academic integrity (see the Module Outline for details).

● Please submit your answers to the three tasks below using the Turnitin link on Blackboard (see Module Outline for details) in a single ﬁle called report-ID .pdf, where ID is your student ID number, for example report-12345678 .pdf. In the Assignments folder, click on Assignment submission to submit your report. Please enter this ﬁle name as the Submission Title.

● Remember that the University places the highest importance on maintaining aca- demic integrity and expects all students to do the same. Please make sure you are familiar with the Regulations Governing Academic Integrity, which are available at http://www.calendar.soton.ac.uk/sectionIV/academic-integrity-regs.html.

● The page limits for each task are strict and is easily suﬃcient to receive full credit. Any pages beyond the limits will not be marked.

Task 1 [60 marks]

Maximum 9 pages plus short appendices

For this task, you can use MLwiN, R or STATA or any combination of the three to perform your analyses.

The dataset for this task contains the points score on A-level Chemistry, a qualiﬁcation usually taken at age 18, of 31 022 students from 2410 schools. Each school is contained in one of 131 Local Eduction Authorities (LEAs). The dataset also contains ﬁve other variables which may explain diﬀerences in the students’ A-level Chemistry scores, including average GCSE score of the student, which can be considered as a summary of their academic ability prior to studying for the A-level.

The dataset is contained in the ﬁle chemistry .csv (available on Blackboard) and contains the following variables:

lea	Identiﬁer for the Local Education Authority (LEA) a school belongs to
school	Identiﬁer for the school a student attends
student	Identiﬁer for the student
score	Point score of the student on A-level Chemistry
gender	Gender of the student: 0 = female, 1 = male
age	Centred age of the student in months
gcsescore	Average GCSE score of the student
gcsecent	A centred version of the variable gcsescore

Use exploratory data analysis and multilevel modelling to investigate the variability in A- level Chemistry scores across students, schools and LEAs, and how this varies by gender, age and the GCSE score of the students. Things to consider in your analysis include:

● Which of the potential covariates are required as ﬁxed eﬀects?

● How many levels are required?

● Which if any of the potential covariates require a random slope or coeﬃcient?

● Whether school mean GCSE score should be included in the model as a contextual variable, and if so, whether there should also be a cross-level interaction between it and student GCSE score.

● How and if gender might be included as a contextual variable.

● To what extent the assumptions for the selected model hold.

The results of your work should be presented in a report of at most 9 pages. The report should contain a few important tables and ﬁgures that are discussed in the text. Short ap- pendices containing additional ﬁgures and tables, but few words, may be included provided the need for each appendix is justiﬁed in the text.

Below is an outline of the marking scheme so that you can assess the important elements for your report:

● Introduction — 5 marks

● Outline of the methods used — 5 marks

● Data description and exploratory data analysis — 10 marks

● Model selection and assessment — 20 marks

● Presentation and interpretation of results — 10 marks

● Conclusions — 5 marks

● General presentation of the report — 5 marks

Task 2 [25 marks]

Maximum 4 pages plus short appendices

For this task, you are not required to do a data analysis using a statistical software. You are free to use whatever you want to do the calculations wherever necessary.

A brief information about the dataset for this task is given as follows.

Discovery Day is a day set aside by the United States Naval Postgraduate School in Mon- terey, California, to invite the general public into its laboratories. On Discovery Day, 21 October 1995, data on reaction time and hand-eye coordination were collected on 108 members of the public who visited the Human Systems Integration Laboratory. The age and sex of each subject were also recorded. One experiment which demonstrates motor learning and hand- eye coordination, is rotary pursuit tracking. The equipment used has a rotating disk with a 3/4” target spot. The subject’s task is to maintain contact with the target spot with a metal wand. Trials were conducted for 15 seconds at a time, and the total contact time during the 15 seconds was recorded. Four trials were recorded for each of 108 subjects. The target spot on the Circle tracker keeps constant speed in a circular path. The target spot on the Box tracker has varying speeds as it traverses the box, making the task potentially more diﬃcult.

The variables in this dataset that are relevant for this task are listed below:

time: Measurement occasion taking values in (0* 1* 2* 3}

gender: 0 if Male, and 1 if Female

cage: Age of subject centred to the overall average age

shape: 0 if Circle, and 1 if Box

score sqrt: Square root of score (outcome variable)

time sq: Square of time

cage sq: Square of cage

Some outputs from an explanatory data analysis (EDA) based on this data set are provided below.

Table 1: Sample means and standard deviations (within parentheses) of the square root of score

time=0 time=1 time=2 time=3

Overall 1.61 (0.67) 1.77 (0.68) 1.84 (0.70) 1.90 (0.73)

Male

Female

1.68 (0.69)

1.48 (0.63)

1.88 (0.67)

1.58 (0.66)

1.97 (0.71)

1.62 (0.65)

2.03 (0.75)

1.68 (0.63)

Circle

Box

1.78 (0.81)

1.51 (0.56)

1.81 (0.80)

1.74 (0.61)

1.90 (0.85)

1.81 (0.61)

1.96 (0.85)

1.86 (0.66)

Figure 1: Score sqrt versus centred age (on the left); individual proﬁles (on the right)

Answer the following questions.

Q.1. Using the information about the data set as well as the results from EDA provided above (Table 1 and Figure 1), explain what kind of data you have, including the speciﬁcation of the hierarchical levels, commenting on potential covariates on the outcome variable, and justiﬁcation of the method of analysis that you think maybe suitable for this data. [4 marks]

Q.2. The outputs from two empty models are presented in Table 2. Specify the covariance

structures assumed under these models. Comment on which model among the two might be more reasonable for this particular data, and explain why by discussing the merits of the model you have chosen over the other one. [4 marks]

Table 2: Outputs from two empty models. Standard errors within the parentheses

Model	Parameter	Estimate
Linear regression model	intercept residual variance: 7e(2)	1.777 (0.034) 0.492
	intercept	1.777 (0.064)
Marginal model with	variance parameter: 7e(2)	0.490
exchangeable structure	variance parameter: p	0.882

N.B. The variance function for the marginal model is re-parametrised as 2(σe(2)g p), with σe(2) 三 σ 2 + σ1(2), where σ 2 and σ 1(2) are the notations used in the Lecture slide 24 of Section 6.

Q.3. Two multilevel models and marginal models with several correlation structures are ﬁtted to the data. All the outputs from model ﬁtting can be found in Tables 3–5. Choose a model among them, and justify your choice. [5 marks]

Q.4. Write the regression equation and the model assumptions for the model you have

chosen in Question 3 (Q.3.). [3 marks]

Q.5. Calculate the variance-covariance and correlation matrices under the model you have

chosen in Q.3. Comment on the covariance structure. [4 marks]

Q.6. Using the results in Table 4, interpret all the ﬁtted regression coeﬃcients of the

model you have chosen in Q.3. [5 marks]

Table 3: Goodness-of-ﬁt statistics and the number of parameters for multilevel models and marginal models with diﬀerent covariance structures

Model/Structure -2*LogLik AIC BIC Nb. of parameters

Random intercept (RI)	191.93	215.93	264.47	12
Random slope (RS)	188.18	216.18	272.81	14
Compound symmetry (CS)	191.93	215.93	264.47	12
Heterogeneous CS	190.61	220.61	281.28	15
First-order autoregressive (AR1)	213.70	237.70	286.24	12
Heterogeneous AR1	212.87	242.87	303.54	15
Unstructured (Unstr.)	182.90	222.90	303.80	20