IEHC0080 2022/23 Essay


This component accounts for 90% of your final module mark. Read the below guidelines to avoid losing unnecessary marks.

§ The assessment is due by 12 January 2023 at 12pm GMT. Please follow the submission guidelines. The submission guidelines are available on the Moodle page for this module.

§ The word limit is 2000 words in total, excluding questions.

§ Please put your candidate number on the front cover of the essay and as the name of the file, (NOT your name or your Student ID), to enable anonymous marking.

§ This is an assessed piece of coursework for the IEHC0080 module; collaboration and/or discussion of any parts of this exam with anyone is strictly prohibited.

§ Academic integrity (including plagiarism) is taken extremely seriously and can disqualify you from the course (for details of what constitutes plagiarism see http://www.ucl.ac.uk/current-students/guidelines/plagiarism).  If you are in doubt about any of this, ask the tutor.

§ Clarification of the questions should be addressed on the Moodle Forum. As this is an assessed piece of work, you should not email/ask the module/personal tutors questions about how to answer the questions.

Guidelines for completing and submitting essay exam

§ The essay part of questions comprises four sections (Part A-D) and one question (Part E) for the practical part. Complete all parts of each section. Be sure to answer all parts of the questions posed. Any attempts will almost always be worth more marks than no attempt at all.

§ You need to submit: (1) a written report (Part A-D) including tables and figures, and (2) an appended (e.g. copy and pasted) do file (Part E) that shows all syntax that are answer the questions. A do file needs to show (1) a command that you took a log file and (2) commented exam questions placed nearer to the relevant syntax. Your do file accounts for 15% of the mark. Ensure your do file is error free.

§ Any presentation and Stata outputs in this assessment are to be made based on original dataset on which datasets you have used for the practicum session were based.

§ Where appropriate, answers should be written in a complete sentence, describing the methodology, approach, or justification.

§ You do not need to cite or discuss relevant literature, while you need to state names of principle statistical concepts/analyses, for example Pearson’s r

§ You should discuss the interpretation of your results and how they relate to the questions you are asked. You do not need an exact definition of each statistical analysis, e.g. ANOVA and logistic regression; however, you need to explain why you used the certain test.

§ In your essay, you should include tabular (up to 3) and graphical (up to 3) outputs alongside your written answers. You should NOT present Stata code in the main body of your report.

§ A codebook for the variables used in this assessment is provided below:

Codebook on selected variables from the HSE2011 teaching dataset.

Variable name

Variable label and description


Systolic blood pressure in mmHg


Mody mass index in kg/m2


Age in categories


Sex of the participant


Total cholesterol in mmol/l


Cigarette smoking status, grouped


Physical activity, grouped


Social class


Self-reported general health

DOs and DON’Ts

· DON’T include raw variable names in the text or tables

· DON’T include unedited log file/screenshots of the outputs in this report.  You will lose marks by doing so.

· DON’T report p= 0.000 despite your Stata output may show. Report the exact p-values where appropriate.

· DO structure your essay in the same order as exam’s questions are

· DO apply a critical alpha value of <0.05 (1-tail) unless instructed otherwise.

· DO use Stata to answer the questions.

· DO use 2 decimal points throughout and 3 decimal places for reporting p-values

· DO make sure tables and figures have titles and are referred to in the text

· DO make sure your tables and figures are self-explanatory. Include a clear title and add notes if needed.

· DO make sure figures on your tables are reproduceable, for example the sum of sub-group numbers is same as the study sample size.

Report brief:

You have been tasked to explore an association between the body mass measured by BMI and level of systolic blood pressure of the study participants (N=3,129).

To examine the association, you are expected to use the BMI measure  (bmi, independent variable) and systolic blood pressure (sbp, dependent variable) along with agegrp and sex of the participants in regression analyses.

To demonstrate your knowledge of basic statistical application and model building, we also provided 5 other variables social class, total cholesterol, physical activity, smoking status and self-reported general health for you to consider adding to your model.

You should start your report with an introduction describing the aim of the work when you start answering Part A. Your answer to Part A will structure your response to the rest of the questions (Part B through D). Your report should end with a concluding remark, summarising the main findings which should be no more than 100 words which is a part of response required in answering Part D.

To answer the questions, you need to recode the social class variable (sclass) and others according to the instruction below:

§ Social class: Grouping those who are professional, managerial technical, skilled non-manual as ‘non-manual’ (=0), those with manual description as ‘manual’ (=1), and drop those in category 7 and 8

§ All variables: You must check how missing values are recoded. Any missing values should be recoded as ‘.’ and use this for complete case analysis, i.e. analytical sample (=analysis only of people with no missing data for any variable).

· Part A (10 points): Description of the dataset

Describe data types and report distributing patterns, e.g. missing cases and central tendency of the variables in the study sample. Present the characteristics of the variables of your analytical sample (i.e. complete cases) with a tabular output.

You should include description of data types in the dataset along with the broad aim. You need to explain how you recategorized variables by recoding the variable.

You should include a table with information on missing cases in the study sample, and the sample size for the analytical sample. Frequency and central tendency (if appropriate) of the variables should be also presented. You need to describe the distribution patterns of continuous variables through assessing empirical measures such as skewness, kurtosis, median and mean.

· Part B (10 points): Presenting a statistical hypothesis

Taking the level of body mass (bmi) as an exposure and level of systolic blood pressure (sbp) as an outcome, formulate a statistical hypothesis which you will later test in Part C. State the level of the tailed test (one vs two) with justification. Answers should include both null and alternate hypotheses with appropriate tailed test.

Note: Your answer should not exceed more than 100 words.

· Part C (30 points in total): Testing associations

(a) Using appropriate tests, either t-test, ANOVA, Pearson’s r, or Chi-squared test according to the data type, explore associations

(1) between covariates (agegrp, sex,recoded social class,total cholesterol,physical activity, self-rated general health and smoking)  and the independent (the level of body mass, bmi)


(2) between covariates (agegrp, sex,recoded social class,total cholesterol,physical activity, self-rated general health and smoking)  and dependent (level of systolic blood pressure, sbp) variable to establish confounding or effect modification relationships (=associations with the independent and dependent variable).

Explain your choice of the statistical analyses.

Please describe your final analytical model in the form of a table and description of its contents.

You are expected to examine statistical significance; using above mentioned test you should conclude whether there is a confounding association or not and to decide whether variables will be included in final model or not.

A table, you are expected to include, should show appropriate test statistics (group mean values for ANOVA) with the level of significance.

(10 points)

(b) Test your hypothesis through examining your analytical model that includes the variables when you answered Part C(a), applying an appropriate statistical test.

Explain your choice of the statistical analysis, which includes testing to meet the essential assumption, for example linear association for the chosen statistical analysis as well as data types of the relevant variable.

Based on the estimates on the Stata output, comment on the results and model fit.  Use one appropriate post-hoc analysis, i.e. specification error test to assess your model further. Present the results with a tabular output. You are expected to present the result (estimates, 95% CI, p-value) from appropriate regression model. (20 points)

· Part D: Reporting interpretation of the results and research implications (25 points)

Comment on the associations in strengths and direction and report whether they support or reject your hypothesis. Also comment on a broader limitation of your model. Summarize and conclude your findings.

· Part E (15 points): Do-File

Submit an appended do-file which can be run to generate the findings in your report. All syntax should be workable and a syntax to start a log file has to be placed before any analyses and the syntax to close the log file should be placed at the end.


Comment each section appropriately; explore the dataset; summarize and describe the dataset; show re-coding of sclass, show how to deal with missing cases; show appropriate testing for confounders/effect modifiers; showing of the use of appropriate regression for crude association; graphic presentation of testing the association between some of characteristics; syntax for appropriate regression in full model and use of specification error test.

· Presentation (10 points):  This is for clear presentation and clarity of answers, especially in regard to production of tabular and graphical outputs.

8-10 clear answers with outputs shown in concise format

5-7 correct answers with outputs that can be understood but cumbersome

0-4 confused answers with unclear outputs.