Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STAT0021: Assessment 4 Instructions

Term 1, 2023 - 24

1    Introduction

Please read and understand these instructions before you begin the assessment.

Assessment 4 will begin with the release of these instructions on the STAT0021 course Moodle page within the “Assessment 4 – Individual Coursework – Term 1” section at 1pm on Wednesday 13th December 2023.

The intention of the assessment is for you to apply the techniques you have learned during the course to a real-world dataset made up of a number of variables (percentage of the population double vaccinated for COVID-19, median household income, median house price, etc.) measured for subregions of London.

A copy of the data to be analysed is available as an Excel spreadsheet on the course Moodle page within the “Assessment 4 – Individual Coursework – Term 1” section.

Assessment 4 makes up 50% of your module mark for STAT0021.

2    Data

The data are real measurements for subregions (Middle Layer Super Output Areas, or MSOAs) of London. In total, there are 11 variables recorded for 982 observations. Vaccination data was recorded in December 2021. Demographic data is accurate as of the 2011 census, but can be treated as being contemporaneous with the vaccination data.

Variable name               Description

ID

A unique identifying number assigned to each observation.

VaxPercent

The percentage of the population who have received at least two COVID- 19 vaccination doses.

Political

An indicator of the political group which controls the borough in which the subregion is located.

0: Conservative

1: Labour

2: Other (Liberal Democrat or no majority party)

PopDensity

The population density (people/km2)

Over65

The percentage of the population who are aged 65 or over

Obesity

The percentage of the population who are classified as obese (BMI ≥ 30)

PostALevel

The percentage of the population who have a qualification above A-Level (e.g. a university degree or similar vocational qualification)

Unemployment

The percentage of the population who are unemployed

HHBenefit

The percentage of the population living in households reliant upon means- tested benefits

MedHHInc

The median household income

MedianHP

The median house price


3    Submission structure

You should structure your analysis and subsequent write-up according to the below headings.

3.1    Exploratory data analysis

The first step in any data analysis is to explore the data to get a sense of what the variables represent and the potential for relationships between them.

Your submission should include three separate, distinct exploratory analyses, each of which contains all of:

A.   The results of a single numerical calculation (e.g. a summary statistic or the results of a hypothesis test).

B.   A single figure (generally containing a single plot, but potentially containing up to three related plots).

C.   A discussion of what your numerical result and figure tellus about the London and justification of why this information is interesting.

Note that:

1.    Each of your exploratory analyses will be marked out of 6 marks (for a total of 3x6=18 marks overall for this section of the assessment).

2.    Marks will be awarded for the degree of insight shown in each part of the analysis. A

numerical result and/or plot which is not discussed will receive a poor mark – a large

proportion of the marks will be awarded based upon the degree to which your discussion correctly interprets your results and justifies why they are insightful.

3.   Variety across the three analyses will be rewarded. For example, submissions which repeat the same analysis and discussion for three sets of variables fail to show a breadth of

understanding and will receive a poor mark.

4.    Neither of your discussions should include VaxPercent, as this is the focus of the later parts of the assessment.

5.   You are free to transform and/or combine variables, and to identify and potentially remove any outliers from the data. Any such decisions should be justified in your discussion.

3.2    Simple linear regression

VaxPercent is the focus of this part of the task. How can the other variables be used to explain the variability in VaxPercent via simple linear regression?

Your submission should include:

A.   Justification of which variable can be used as a covariate to produce the best simple linear regression model for the outcome VaxPercent.

B.   An interpretation of the estimated model coefficients for your best simple linear regression model.

C.    Comments on the fit of your best simple linear regression model.

D.   A plot of VaxPercent against the covariate in your best simple linear regression model with the accompanying regression line.

Note that:

1.   This component of your submission will be marked out of 9 marks.

2.    There is not a specific definition of the “best” model. It is likely to be based both upon how well the model fits the data and how well the assumptions underlying simple linear regression are satisfied (quantitative and qualitative evidence). Include in your justification why you would categorise your model as being the best and the steps you took to arrive at this best model.

3.    Your model can include a variable which is not present in the original dataset, but which has been obtained via a transformation or combination of variables in the original dataset. You   should not bring in external data. You should provide a justification of why any new variable is useful/interesting if you haven’t already given an explanation earlier in your submission.

4.   You should support your justification, interpretation and comments with suitable Stata output.

5.    If there are any particularly unusual observations identifiable as a result of your analysis, you should mention them using their ID and justify why you door do not believe them to be outliers. If you believe them to be outliers, then you can exclude them when fitting your model.

Your submission should also include:

E.    The lower quartile, median, and upper quartile value for the covariate in your best simple linear regression model.

F.    A mathematical equation to indicate how your best simple linear regression model can be used to make predictions of VaxPercent.

G.   Predictions of the value of VaxPercent when the covariate in your best simple linear

regression model takes its lower quartile, median, and upper quartile values.

Note that:

6.   This component of your submission will be marked out of 3 marks.

7.    If your best model includes variable x as the covariate, you should use Stata to calculate the lower quartile, median, and upper quartile values of x. Then, calculate the corresponding

predicted values of VaxPercent according to your best model.

3.3    Multiple linear regression

VaxPercent is again the focus of this part of the task. How can the other variables be used to

explain the variability in VaxPercent via multiple linear regression?

Your submission should include:

A.   Justification of which variables can be used ascovariates to produce the best multiple linear regression model for the outcome VaxPercent.

B.   An interpretation of the estimated model coefficients for your best multiple linear regression model.

C.    Comments on the fit of your best multiple linear regression model fit.

Note that:

1.   This component of the assessment will be marked out of 9 marks.

2.    There is not a specific definition of the “best” model. It is likely to be based both upon how well the model fits the data and how well the assumptions underlying multiple linear regression are satisfied (quantitative and qualitative evidence). Include in your justification why you would categorise your model as being the best and the steps you took to arrive at this best model.

3.    Your model can include variables which are not present in the original dataset, but which are obtained via a transformation or combination of variables in the original dataset. You should  not bring in external data. You should provide a justification of why your new variables are useful/interesting if you haven’t already given an explanation earlier in your submission.

4.   You should support your justification, interpretation and comments with suitable Stata output.

5.    If there are any particularly unusual observations identifiable as a result of your analysis, you should mention them using their ID and justify why you door do not believe them to be outliers. If you believe them to be outliers, then you can exclude them when fitting your model.

Your submission should also include:

D.   The lower quartile,median, and upper quartile values for each covariate in your best multiple linear regression model.

E.    A mathematical equation to indicate how your best multiple linear regression model can be used to make predictions of VaxPercent.

F.    Predictions of the value of VaxPercent when the covariates in your best multiple linear

regression model jointly take their lower quartile, median, and upper quartile values.

Note that:

6.   This component of the assessment will be marked out of 3 marks.

7.    If your best model includes variables x1, x2, … as the covariates, you should use Stata to

calculate the lower quartile, median, and upper quartile values of x1, x2, …. Then, calculate  the corresponding predicted values of VaxPercent according to your best model. That is, you should submit three predicted values. One for when all of your covariatestake their

lower quartile values, one for when they all take their median values, and one for when they all take their upper quartile values.

3.4    Linear regression with a factor variable as a covariate

Linear regression is useful for understanding how continuous variables influence other continuous variables. There may be occasions when we would like to understand how categorical variables, also referred to as factor variables, influence a continuous variable. Careful consideration of a factor variable can allow for its inclusion as a covariate within a linear regression. While linear regression with a factor variable as a covariate isn’t taught as part of STAT0021, you should be able to extend your knowledge of linear regression from STAT0021 to understand the basics of linear regression   with a factor variable as a covariate through a small amount of research.

VaxPercent is the outcome of interest, with the link to Political being the aim of the investigation.

Your submission should include:

A.   Stata output including the results of an appropriate test taught as part of STAT0021 to determine whether the mean value of VaxPercent differs according to the levels of Political.

B.   An interpretation of those test results.

C.   A suitable plot to compare VaxPercent and Political.

D.   Plot (or plots) necessary to verify whether the assumptions of your test are satisfied.

Note that:

1.   This component of the assessment will be marked out of 3 marks.

Your submission should also include:

E.    Stata output including the results of a linear regression model for VaxPercent using Political treated as a factor variable as the covariate.

F.    An interpretation of the estimated model coefficients from that linear regression model.

Note that:

2.   This component of the assessment will be marked out of 3 marks.

Your submission should also include:

G.   A mathematical equation to indicate how this regression model can be used to make predictions of VaxPercent.

H.   Predicted values of VaxPercent when Political takes each of its three different

levels.

Note that:

3.   This component of the assessment will be marked out of 3 marks.

Your submission should also include:

I.     Discussion of the benefits of building a linear regression model using Political as a  factor variable covariate in contrast to the drawbacks of a linear regression model using Political as a continuous variable covariate when wishing to determine how

VaxPercent varies with Political.

Note that:

4.   The component of the assessment will be marked out of 3 marks.

3.5    General marks

6 marks are available to submissions which:

A.   Are clear, well-written and formatted; with plots and Stata output adequately sized and labelled; and which correctly follow the submission format instructions.

4    Submission details

4.1    Submission format

You should submit a single file, saved as a pdf and named as “Assessment 4 [your student number]” . For example, if your student number is 22000000 then your submission should be a single pdf file named “Assessment 4 22000000” .

Your submission should also include within it your student number, but should not contain your name.

4.2    Submission length

Your submission should be made up of no more than:

•    Five A4 pages of discussions which coverall of the requirements outlined in the previous section, with a font size no smaller than 10 points.

•    Ten pages of Stata output (as screenshots) and other relevant figures. Each figure should

have a number by which it is referred to in your discussions. Figures should be of a suitable size and quality to be easily interpretable.

•    One page, if necessary, of references to journal articles, books, websites, AI tools, etc.

Requesting the discussions and figures be separated in this way may seem unusual, but is done to stress both that enormous amounts of writing are not expected for this assessment and that carefully chosen figures can be just as (or even more) useful than a greater volume of text. The permitted length is an upper limit, not a guide for how much you are expected to submit. If you can clearly explain your thoughts more concisely then shorter submissions will not automatically be marked lower.

Any submission which is over the permitted length will suffer a penalty of 10 percentage points, although any such penalty will not reduce a mark below the passmark of 40%.

4.3    Submission procedure and deadline

You must complete your submission via the “Assessment 4 – Individual Coursework – Term 1” section of the STAT0021 course Moodle page before the deadline of 1pm on Wednesday 17th     January 2024.

There are standard non-negotiable penalties for late submissions which you can read about in the

UCL Academic Manual. Any extension to the deadline can only be granted where a student has a

Summary of Reasonable Adjustments (SoRA) or has successfully claimed extenuating circumstances.

Extenuating circumstances are handled by your parent department and not by the teaching department.

4.4    Stata

Throughout the information above on the expected submission structure it is mentioned that you should include supporting evidence from Stata. This is referred to because use of Stata has been

taught as part of STAT0021. If you would prefer to make use of other software to perform the

analyses, and you believe that you can obtain results just as good as those produced by Stata, then  you are free to do so. If you are considering this, you are strongly encouraged to contact the course lecturer att.honnor@ucl.ac.ukto discuss your decision.

5    Plagiarism, collusion and referencing

Every student completing the submission agrees to having read and understood the “Academic integrity, plagiarism and collusion” document within the “Assessment 4 – Individual Coursework – Term 1” section of the STAT0021 course Moodle page.

References to any sources should be included using your choice of a standard referencing system.

Submissions will be run through the Turnitin system.

5.1    Use of AI tools

UCL assigns assessments to one of three tiers, depending upon to which AI tools can be used on the assessment. This assessment falls under the second tier: AI tools can be used in an assistive role.

You may use AI tools to support you in completing this assessment. AI tools must not be used to write the assessment for you, any text appearing within your discussion must be written in your own words and not simply copied and pasted from the output of an AI tool.

UCLs guidance on AI toolsnotes that:

•     Before using generative AI, you should ensure that:

o  You understand the limitations and risks of using generative AI.

o  Your assignment/research remains your own work.

•     Generative AI can be a useful starting point to gather background information on a topic, but be aware that:

o  Generative AI produces information that may be inaccurate, biased, or outdated.

o  Generative AI is not an original source of information: it reproduces information from unidentified sources.

o  Generative AI may fabricate quotations and citations.

o  It is always best to refer to original and credible sources of information.

•     If you do choose to use generative AI tools, you must always:

o  Critically evaluate any output it produces.

o  Carefully check any quotations or citations it creates.

o  Correctly document your use of the tools so that it can be appropriately acknowledged.

6    Queries

Any queries about Assessment 4 should be emailed to meatt.honnor@ucl.ac.uk. Any queries which require and receive a substantial informative response will be posted to a forum within the “Assessment 4 – Individual Coursework – Term 1” section of the STAT0021 course Moodle page to ensure that no student receives an undue advantage via this process.