CHME0017 Public Health Data Science 2020

发布时间：2024-06-24

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

MSc Module in Public Health Data Science ASSESSMENT

There is currently a pandemic caused by the novel coronavirus SARS-CoV-2. In April 2020 the intensity and duration of this first wave of the pandemic was uncertain, but Governments around the world needed to make decisions about how to act to protect the health of their populations. Many Governments were taking difficult decisions on the basis of infectious disease models that attempted to predict the future numbers of infections, the number of intensive care beds required to treat these cases and the number of deaths that might result from the disease COVID-19.

One prominent set of COVID-19 predictions that were widely used by policy and decision makers across the world was produced by the Institute of Health Metrics and Evaluation (IHME), University of Washington.1–3 The projections produced by IHME have been updated over time and their stated aim was to: “help leaders of medical systems figure out innovative ways to deliver high-quality care to those who will need their services in the coming weeks” .

On 9th April the Public Health Data Science team at UCL scraped

https://covid19.healthdata.org/ to gather data on the IHME forecasts for the US and Europe.

You can find the data we scraped here

http://0si82yedx.phds.s3-website-us-east-1.amazonaws.com/IHME-Global/index.html

We want you to use the data contained in the file 2020-04-09.tar.gz which is the IHME predictions made on 9th April 2020.

For your assignment we would like you to undertake an analysis of how well the IHME predictions performed in terms of their ability to predict the following outcomes during the period of 9th April- 25th May 2020:

1. Peak daily COVID-19 death rate

2. Date of the peak daily COVID-19 death rate

3. Total number of deaths

You may decide to focus on one specific country, or undertake your analysis across multiple countries. We suggest you compare the IHME projections with data published by the World Health Organisation, either by taking this directly from the WHO4 site (https://covid19.who.int/) or from a secondary source such as Johns Hopkins (https://github.com/CSSEGISandData/COVID-19).5

On day two of the Hackathon we will introduce and review the assignment. You will then get the chance to work on the assignment in your groups for day two and three of the Hackathons. Whilst you will work on this problem in groups during the Hackathons, it is essential that your final code and report submitted is entirely your own work.

The deadline for the coursework submission is 29th June 2020 - 12:00pm.

Submissions will be made via Moodle. We would like you to submit three documents as part of this assignment:

1. An R Markdown file or R script that contains the analysis that we can run to reproduce your analysis

2. A PDF of the R Markdown file or R script

3. A Report that has a maximum of 1,500 words of text (not including analytical code, tables, or Figures) and contains the following sections:

● Background

a. A statement of the context and setting

● Aims

a. A description of the aim for the analysis you have developed

● Methods:

a. Description of data items included in the analysis

b. Description of the methods for creation of the analysis

● Results

a. Presentation of the overall results

b. Presentation of disaggregated results

● Discussion

a. Interpretation of results, including their strengths and weaknesses

b. How the results could be used to improve the health of the public by conducting one of the following:

i. A brief stakeholder analysis

ii. A brief media engagement and dissemination strategy

Public Health Data Science - CHME0017

COVID-19 Mortality Modelling Accuracy of IHME models

Background

In December 2019, a novel coronavirus SARS-CoV-2 that causes the disease COVID-19 was initially identified in Wuhan, China. Overtime, the virus has spread across the globe, resulting in the current pandemic. Since 28 June 2020, there were over 10 million reported cases with over 500,000 deaths of COVID-19 (JHU, 2020).

The Institute of Health Metrics and Evaluation (IHME) has produced a set of models using data from the World Health Organisation (WHO), local and national governments. The projections for European countries were made using the models of peak deaths of Wuhan and 7 other European locations which have reached their peak. The number of different social distancing measures such as school closures and stay-at-home recommendations enforced are also taken into account (IHME, 2020b).

Implementation of public health measures can reduce the number of cases and deaths as well as lessen the demand from healthcare systems (Zhang et al., 2020). Prediction modelling shows how different strategies can influence the spread and impact of COVID-19, and therefore are essential to inform policy decisions (IHME, 2020a).

Aim

The aim is to assess the ability of the IHME models to predict outcomes between 9th April to 25th May 2020. Outcomes include:

1) Peak number of COVID-19 daily deaths

2) Date of peak COVID-19 daily deaths

3) Total number of deaths

It is important to recognise the accuracy of these models as it would provide insight on how far off they would be on their next prediction, and thus better inform policy decisions.

Data items

The IHME dataset included predictions on the daily and cumulative mortality, the number of patients requiring hospitalization, ICU beds and invasive ventilation of 30 countries. The WHO dataset consists of data on the number of daily and total COVID-19 cases and deaths of 194 member states. Some countries such as France and Italy had their observed or predicted peak of daily deaths before 9th April and therefore were not included from the analysis. Czechia, Germany, Sweden and United Kingdom (UK) were selected as each represented a different severity in terms of number of deaths, of the pandemic. The exclusion of countries that had an early peak of daily deaths may have introduced selection bias and resulted in the loss of potentially important information. However, it is also essential that a set time period was used to prevent differences in modelling methods which are updated regularly.

Since our analysis involves evaluating the accuracy of IHME models to predict daily and cumulative mortality as well as when the peak mortality occurs, variables including the date and those related to the number of deaths were selected. There are two datasets from IHME - deaths by day and cumulative deaths. Both consist of the crude number of observed and predicted numbers. However, only the projected values were used as the prediction models are being assessed. Three prediction variables: the mean value of deaths, the upper, and lower values of the 95% confidence interval (C.I.) were selected, though only the mean value was primarily used for the analysis. From the WHO dataset, only data on the observed number of deaths by day and in total were selected. It is essential that the date reported was also chosen from both dataset so that the date of observed and predicted peak daily deaths can be identified.

Methods

Data items were webscraped from IHME and WHO websites. Relevant variables aforementioned were kept while others were removed. WHO data were filtered to retrieve data on the four chosen countries and both WHO and IHME datasets were also filtered to exclude data outside the specified time period (2020/04/09 - 2020/05/25). Two data frames were created for each country - one containing the predicted and observed values of daily deaths and the other with that of total deaths. All four datasets with daily mortality values were combined. The observed number, predicted mean number, upper and lower limit of prediction of the four countries were each added up, creating aggregated data. The same was done for the data frames of total deaths.

To calculate the prediction error of each IHME model, root mean square error (RMSE) was calculated using the available crude observed and predicted values from the datasets. RMSE assesses the square of differences between the two types of values, recognising and giving a higher weight to large errors (Hansen, 2013; Mubarik et al., 2020). Thus, a lower RMSE value indicates a better fit. In addition, graphs of peak daily and cumulative deaths for aggregated data and the four countries were plotted. Each graph included the observed data as well as the predicted mean value with upper and lower limits, allowing the accuracy of the model to be visualized.

Results

The IHME predictions of daily and total deaths were compared against WHO observed data and plotted into graphs (Figure 1). The predictions for aggregated data were highly inaccurate evident by the high RMSE values and large disparities between the observed and projected values as well as the peak daily mortality dates. The observed daily mortality remained close to or below the lower value of the 95% C.I. until 29th April, followed by the peak of 4699 deaths the subsequent day that exceeded the prediction range. The predicted mean number of total deaths was consistently higher than the reported value throughout, with 30314 more deaths at the end of the study period (Table 1).

Figure 1 Observed and Predicted A) daily and B) total deaths plots using aggregated data . The red shaded region indicates the 95% confidence interval of the predicted data.

The accuracy of the predictions for both daily and cumulative mortality varied for different countries. The daily deaths model for Czechia had the lowest RMSE value of 6.70, while that for the UK was the highest of 1327.09 (Table 1). For Czechia, Germany and Sweden, the observed deaths by day were generally within the predicted range. However, they became higher than the predicted mean value when a gradual decrease in deaths were projected (Figure 2). Moreover, the dates of their predicted peaks were off by 2-5 days from the reported date of peak (Table 2). Inversely, the observed daily mortality for the UK was close to or below the lower limit of prediction confidence interval during the first month which was then followed by a sharp increase the next day. This peak number was observed on the 30th April which was 13 days later than the predicted date.

Figure 2 Observed and predicted daily deaths plots of A) Czechia, B) Germany, C) Sweden and D) U.K. The red shaded region indicates the 95% confidence interval of the predicted data. Note that the y-axis scales are different for each graph.

	Observed peak no. of daily deaths	Predicted peak no. of daily deaths (95% C.I.)	Difference between observed and predicted	RMSE (2 d.p.)
Aggregated	4699	3395 (944-9478)	1304	1327.09
Czechia	18	23 (4-106)	-5	6.70
Germany	315	377 (83-1237)	-62	88.76
Sweden	185	134 (53-296) 134 (60-273) *	51	56.91
U.K	4419	2932 (829-7922)	1487	1244.28

Table 1 Observed and predicted peak number of daily deaths, and RMSE of the IHME prediction models

*2 days of same peak number of deaths but different 95% C.Is

	Date of observed peak no. of daily deaths	Date of predicted peak no. of daily deaths	Difference between observed and predicted
Aggregated	30/04/2020	17/04/2020	-13 days
Czechia	15/04/2020	17/04/2020	-2 days
Germany	16/04/2020	19/04/2020	3 days
Sweden	22/04/2020	24/04/2020 27/04/2020 *	2 days 5 days
U.K	30/04/2020	17/04/2020	-13 days

Table 2 Dates of observed and predicted peak number of daily deaths and the difference between the two. *2 days of same peak number of deaths

Similar to the accuracy of the daily deaths model, the model for total deaths for Czechia was the most accurate with the lowest RMSE value of 134.78 (Table 3). A large difference between the observed and predicted cumulative deaths for the UK resulted in the highest RMSE value among the four countries. The number of reported cumulative deaths for all four countries was lower than the predicted mean values while lying within the 95% confidence interval, except the UK. Its reported values were below or close to the lower limit of the predicted 95% C.I. between 9th April to 29th April (Figure 3).

Figure 3 Observed and predicted total deaths plots of A) Czechia, B) Germany, C) Sweden and D) U.K. The red shaded region indicates the 95% confidence interval of the predicted data. Note that the y-axis scales are different for each graph.

	Observed total no. of deaths	Predicted total no. of deaths (95% C.I.)	Difference between observed and predicted	RMSE (2 d.p.)
Aggregated	49363	79677 (33658-182367)	-30314	30433.94
Czechia	315	411 (128-1347)	-96	134.78
Germany	8257	8792 (4253-21288)	-535	1264.84
Sweden	3998	4176 (1723-9750)	-178	515.76
U.K	36793	66298 (27554-149982)	-29505	28624.37

Table 3 Observed and predicted peak number of total deaths, and RMSE of the IHME prediction models

Discussion

By comparing the projected and observed data, the accuracy of the prediction models were quantified using RMSE. Analysis of four countries of different severities have presented a positive correlation between the number of observed daily or cumulative mortality and RMSE value, suggesting an increasing prediction error and poorer modelling fit. Moreover, IHME models were also better at predicting the date of peak deaths of epidemics in Czechia, Germany and Sweden, countries with relatively small epidemics compared to the UK. Thus, these evidence indicates that the IHME models perform better for smaller epidemics.

Both aggregated data prediction models had high RMSE values as a result of being highly influenced by the low accuracy of UK’s prediction models. Inconsistencies in the data may have also contributed to higher errors of models, such as the peak deaths in the UK which may be caused by the inclusion of deaths with a positive confirmed test in settings outside hospitals on the 29th April (UK Government, 2020) . Furthermore, an observed death by day of -1 was seen in the Czechia data as a result of retrospective data reconciliation (WHO, 2020). Different accuracies may also be attributable to under-reporting as well as dissimilar case detections, testing methods, reporting practices and lag times of each country, leading to an under or over estimation of true deaths. A limitation of the analysis is the use of only WHO data since predictions by IHME were based on reported deaths from multiple sources including WHO websites, local and national governments (IHME, 2020b).

Figure 4 Stakeholder analysis evaluating key stakeholders from national and international, public

interest and local bodies. Current hypothetical power and interests of stakeholders are plotted with arrows indicating shifts required from the analysis.

A stakeholder analysis was carried out to evaluate the level of power and interest of key stakeholders (Figure 4). It is essential for those with position power such as Ministers of Health and public health organisations to recognise the accuracy of IHME as it would better inform policy decisions. Our analysis will increase the expert power of global health organisation, allowing the provision of appropriate guidance to countries regarding their public health measures. Negative narratives may arise from media which should therefore be regulated by stakeholders with position power to reduce their negative power.