Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

ECOM2000: Econometric Principles ‐ Data Analysis Project

2022

1.  Introduction

One of the hypotheses that have been widely discussed in the literature of development/               environmental economics is the Environmental Kuznets Curve (EKC). It states that the                  relationship between a country’s national income and the extent of environmental degradation is in an inverted U‐shape. That is, the extent of environmental degradation increases with national  income at a diminishing rate and starts decreasing as national income increases further beyond a certain level. In this project, we will test the EKC hypothesis empirically using data from the

World Bank.

2.  Preliminary

Data collection

To do this project, you need to download the following data from the World Bank’s (WB’s)      World Development Indicators (WDI) website (https://databank.worldbank.org/source/world developmentindicators):

Variable

WB indicator name

Measurement

WB data code

CO2

CO2 emissions

Metric tons per capita

EN.ATM.CO2E.PC

GDP

GDP per capita

Constant 2015 US$

NY.GDP.PCAP.KD

PopDen

Population density

People per sq. km of land area

EN.POP.DNST

UrbPop

Urban population

% of total population

SP.URB.TOTL.IN.ZS

Please follow the steps below to download these data from the WB’s website:

1.   Expand Country” tab on the left‐hand side of the website and choose all countries. To do this, you need to select “Countries” out of three options, then select all countries by ticking the box on the next line. You should see that you have selected 217 countries. (see Image 1 at the end of this document)

2.   Expand Series” tab and search the required data series by the WB indicator name or data code listed above. Go through the search results and tick the box next to the intended        variable (pay attention to the measurement as well). (see Image 2)

3.   Move to the Time” tab and select “2018” by ticking the box next to it. (see Image 3)

4.   Click Apply Changes” on the right‐hand side of the website. (see Image 3)

5.   Under Download options,” choose “Advanced Options” . (see Image 3)

6.   In the popup window, select Names only” within Variable format:” option. (see Image 4)

7.   Click Download” and save the file in your local drive.

Data Cleaning/Formatting

Before analyzing data, you have to follow several steps to clean and rearrange it. First, opening the data file in Excel, you notice that the data downloaded from the WB are arranged as:

Column A               Column B                Column C

Country Name       Series Name            2018                        

You see that the data are stored in rows 2‐869, and below them, you see the following texts in lines 873 and 874:

Data from database: World Development Indicators

Last Updated: ##/##/2022

Please delete these two lines and save the Excel file under the same name. (see Image 5)

Next, we need to convert the data format from a long form (data on 4 variables from 217              countries are stacked vertically in one column) into a wide form (data are stored in a table form  so that the first column stores the country name and subsequent columns store the data on one   variable in each column).  There are many ways to perform this transform, but one possible way is to execute the following in R:

dat = readxl::read_excel("[path]/Data_Extract_From_World_Development_Indicators.xlsx", sheet = "Data")

datw = spread(dat, "Series Name", "2018")

We are familiar with the first line, which reads the Excel data into the workspace (you need to    change the file path). The second line convert the data from a long form into a wide form and     save the new data as “datw.”  We also want to shorten the variable names so that they are easier to handle.  We can try:

datw = rename(datw, CO2 = "CO2 emissions (metric tons per capita)",

GDPpc = "GDP per capita (constant 2015 US$)",

PopDen = "Population density (people per sq. km of land area)", UrbPop = "Urban population (% of total population)")

Now, a new data matrix “datw” contains the country name in the first column and the data on four variables (CO2GDPpc, PopDen, and UrbPop) in columns 2‐5.

Two more steps we need to follow are: (1) convert missing values from “ ..” into NA” and      eliminate them from dataset, and (2) change the data type from character to numerical.  These can be done by:

datw[datw==".."] = NA

datw = na.omit(datw)

class(datw$CO2) = "double"

class(datw$GDPpc) = "double"

class(datw$PopDen) = "double"

class(datw$UrbPop) = "double"

The first line change “ ..” into “NA” (which is the default value for missing observations in R),  while the second line eliminates these missing observations from the dataset.  The remaining    four lines change the data type from character into numeric for the four variables.  Now we are ready to analyze the data.

3.  Data Analysis

Aanalyze the WDI data using R/RStudio and answer the following 11 questions.

1.    (6 points) Create a new variable CO2k by converting the data on CO2 emissions from metric tons per capita into kilograms per capita (by multiplying the original data by 1,000). Then, create a scatter plot of CO2 emissions per capita (vertical axis) against per capita GDP          (horizontal axis). Please label each axis clearly.

2.    (10 points) Under the assumption that CO2 emissions (in kg) are distributed independently and identically in the population, construct a 90% confidence interval of the population      mean of CO2 emissions per capita (in kg) manually (that is, using the sample mean, sample variance, and the appropriate critical values obtained from either R and/or statistical           tables). Interpret the calculated confidence interval.

3.    (10 points) Estimate a multiple regression model with CO2 emissions per capita (in kg) as    the dependent variable, and GDP per capita, GDP per capita squared, population density, and the share of population living in urban areas as explanatory variables. Write down the estimated sample regression equation.

4.    (8 points) For the regression model estimated in Question 4, interpret the reported R‐square value as well as the standard error of the regression. Briefly comment on the model’s           goodness of fit to the observed data.

5.    (8 points) For the regression model estimated in Question 4, provide interpretations of the estimated coefficients for PoPDen and UrbPop.

6.    (10 points) For the regression model estimated in Question 4, test if the true population   coefficient for PoPDen is negative at a 10% test size, using a critical value approach. State clearly the null and alternative hypothesis.

7.    (10 points) For the regression model estimated in Question 4, construct a 99% confidence interval of the true population coefficient for UrbPop. Interpret the obtained confidence  interval.

8.    (14 points) Using the regression model estimated in Question 4, calculate the predicted     values of CO2 for a range of GDP observed in the sample (with 1,000 increments) whilst  keeping the values of PopDen and UrbPop at their respective sample means. Create a two‐

dimensional diagram with the predicted values of CO2 (vertical axis) is plotted against    GDP (horizontal axis). Briefly describe the relationship between CO2 emissions per capita and GDP per capita as implied by the estimated regression model. Does this have the       shape you expected? Explain why/why not?

9.    (6 points) Based on the model estimated in Question 4, find the level of GDP per capita      where the effect of GDP per capita on CO2 emissions changes its sign. Briefly comment on how this relates to your answer to Question 8 above.

10.  (10 points) Describe how you could test a joint hypothesis that the true population            coefficients for PopDen and UrbPop are both equal to zero at a 5% significance level. State the null and alternative hypothesis, and clearly present the test statistics and how you     would calculate it.

11.  (8 points) Implement the joint hypothesis test as described in Question 10 at a 5% significance level.

4.  Further Instructions

     This is an individual project, not a group project. You are required to work and compose

your report individually.

     You need to submit either:

i.     A PDF report generated by RMarkdown that contains your R commands, outputs, and text‐based answers addressing each question, or

ii.     A PDF document containing your text‐based answers to the questions, AND an              RMarkdown file (or an R‐script file plus its outputs) providing your R commands and  their outputs.  Please attach your RMarkdown or R script (and output) file at the end of your text‐based report.

     If you choose the second of the above two options, please include the key results of your data

analysis from R (for example, descriptive statistics, figures, regression outputs, etc. but not the data) into your text‐based report, so that your report can be read without referring to  your R script/output file.

     Your report is marked out of 100 marks in total and will count toward 35% of your overall

final grade.

     You need to show all of your workings. Full marks will not be awarded if any parts of the

essential steps are not presented/described.

     Please submit your report through Turn‐it‐in by the due dates specified on the first page of

this instruction.

Appendix: Captured images for Data Download

Image 1:

 

Image 2:

 

Image 3:

Image 4: