Project 1: GDP Convergence
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Project 1: GDP Convergence
Executive Summary
Linear regression models are used to test the theory that claims that poor countries tend to have higher per capital growth rate when they share similar structural parameters for preferences and technology. EDA and linear regression are applied to explore the economic growth data from 2009 to 2011 and study how the wealth of a country could affect its measure of economic growth. The primary goal is to investigate the factors affecting economic growth significantly and verify the theory in neoclassical economic growth models. And the whole analysis will be completed in R markdown.
Keywords: Linear regression, Economic growth, R markdown
Data Description
The dataset in this project is taken from the United Union that records the national statistics related to country economy. It contains 166 rows of observations and 8 columns ofvariables from the years of 2009 to 2011. More details ofthe variables are listed below:
• Country: country names, which can be used as an index column.
• region: Region of the world: Africa, Asia, Caribbean, Europe, America, Oceania.
• fertility: Total fertility rate, number of children per woman.
• ppgdp: Per capita gross domestic product (GDP) in US dollars.
• lifeExpF: Female life expectancy, years.
• pctUrban: Percent of population in urban areas.
• infantMortality: Infant deaths by age 1 year per 1000 live births
• gr: GDP growth rate
Here, the variable of primary interest is gr, the GDP growth rate so it will be treated as the dependent variable. And the variable of Country has a different value for all observations so it can be treated as an index variable that has little analytic value in this project. As for all other variables, they will be treated as the independent variables.
EDA
We will start with the exploration on gr by summary statistics and graph.
Min Q1 Median Q3 Max Mean SD Range No.
Growth Rate -3 1.1 2.05 3.18 18.2 2.25 2.3 21.2 166
The minimum growth rate is -3%, and the maximum growth rate is 18.2%. It is a big contrast, which implies different countries can differ significantly regarding economic growth. And the average growth rate is about 2.25%, which is relatively larger than the the median of 2.05%. It implies that there are some extremely high values in growth rate that causes the mean to be higher in our data.
It is clear that there are some countries having abnormally high growth rate, and these
countries are “Bosnia and Herzegovina”, “Cambodia”, “China”, “Equatorial Guinea” and “Myanmar”. As we can notice, they are all developing countries with relatively low GDP per capita.
Next, we try to explore the relationship between the numeric factors and the growth rate by pairwised scatterplot.
It seems that the means of GDP growth rate are almost constant as the the values in infantMortality, pctUrban and lifeExpF change. And there is no obvious difference in means of growth rate across different values in ppgdp either. However, we notice for the countries with lower per capita gross domestic product (GDP) in US dollars, the spreads of the GDP growth rate are much larger such that some countries have obviously higher growth rate.
Within the independent variables, high correlations could be observed. For example, the total fertility rate is almost perfectly negatively correlated to the female life expectancy in years. It is not out of our expectation because these are all economic factors that are associated with each other. However, it is not desirable in the linear regression because it causes the issue of multicollinearity and makes the estimated parameters inconsistent.
Finally, we compare the the average GDP growth by region. It seems Asia and Europe are the two best regions regarding GDP growth rate while countries from Oceania show the slowest economic growth on average.
region gr
Africa 1.795652
America 1.995238
Asia 2.960976
Caribbean 1.844444
Europe Oceania
2.656757
1.083333
Modeling (Linear Regression)
As we have observed in EDA, there are several outliers in the original data that results in the right skewness of the distribution of gr. It may voilate the normal assumption of the dependent variable. Hence, I decide to remove these outlier, and the histogram of gr without outliers are shown below.
We will start with the full model that contains all factors. Here we also note that R will treat the region as 5 dummy variables based on it’s 6 levels.And the model summaries are shown as below:
Although the overall model performance is statistically significant with F = 3.811 (p < 0.01), we notice many predictors are insignificant. Hence, we have to do variable selections to improve the model. And the stepwise method based on AIC values will be used to complete this procedure.
The final model is
= 4.532 − 0.598 − 0.00002 − 0.010 Finally, we draw the diagnostic plots to help us validate the model assumptions.
Results
We keeps the factors of ppgdp, pctUrban and fertility in our final regression model on gr. That is, per capita gross domestic product (GDP) in US dollars, total fertility rate and percent of population in urban areas are considered to be the most relevant economic factors to GDP growth rate of a country. The overall model is statistically significant and could explains about 15.8% of the total variations from the GDP growth rate.
And holding the total fertility rate and percent of population in urban areas constant, one US dollar increase in the per capita gross domestic product (GDP) is expected to result in 0.00002% decreases in the GDP growth rate of this nation on average. Such negative effect of GDP per capita on GDP growth rate is statistically significant at 5% level based on the associated p-value (< 0.05).
When it comes to the model assumptions, it seems the normality assumption is satisfied as the data points fit the normal line pretty well based on the QQ plot. The assumption of zero- conditional mean for the residuals may not be valid as the residual plot shows but the difference seems to be minor. And the assumption of constant variance is likely to be constant as the spreads of residuals are similar as the fitted values change. And there is no influential point and bad leverage point being detected. Hence, we think the linear regression model should be appropriate in general.
Conclusion
Based on our analysis, we notice that different countries have different economic growth. During the years of 2009 to 2011, the countries that achieved highest GDP growth were all relatively poorer developing countries.
After excluding the extreme values, we find total fertility rate, GDP per capita and urban population proportion are three factors that could affect the GDP growth rate the most via the linear regression method. Holding the two structural parameters of fertility and urbanization to be constant, we notice a significant negative effect of GDP per capita on the further economic growth. Specifically, we expect the GDP growth would be 0.00002% lower for each more US dollar earned per person of the countries having the same fertility and urbanization rate. These results imply that poorer countries are expected to grow faster than richer countries if they are similar with respect to structural parameters. That is, our analytic results are consistent with the theory in neoclassical economic growth models.
In conclusion, economic factors like the total fertility rate and urbanization of a nation could affect the GDP growth rate significantly, When these factors are assume to be constant, higher wealth per person owned of a nation would be expected to result in slower growth rate in GDP. In other word, we conclude that a country’s per capital growth rate tends to be inversely related its starting level of income per person.
Appendix of all code used
2022-04-01