Module 5: Problem Set 5

Econometrics Summer 2022


Module 5: Problem Set 5

Econometrics Summer 2022

General Guidelines

· Submit your answers in the associated Canvas assignment. In your submission, feel free to choose from a combination of word document files (with screenshots of Stata output), do files and/or log files.

· You will need to access use Stata and/or Microsoft Excel for your problem sets. Stata can be purchased or accessed remotely via Citrix, instructions are on the syllabus and posted to Canvas.

· When working on problem sets, be sure to show your work and carefully explain your answers. For questions that require some computation, answers that consist solely of a single number (even if correct) will be given less credit than those that show how the answer was deduced. For interpretation questions, answers that solely restate an estimate (for example, ) will be given less credit than those that do not explain what the estimate means in words.

Exercise 1

In the document “Problem Set 5 Mata Output” we have the internet use, per capita GDP, and population density of 21 countries.  This document sets out the analysis for estimating a function that could be used to predict the internet use of these countries as a linear function of per capita GDP and population density.

i. Write down the function estimated with standard errors and t-values in parentheses under each slope coefficient

ii. From the Analysis of Variance table, write down the formula for each of the nine numbers showing the computations for any for the values presented.  

iii. From the Analysis of Variance table, write down the formula and compute the R squared.  Put what the R squared tells us into words.

iv. From the Analysis of Variance table, show the formula and the computations of the standard error of estimate.  What is the standard error of estimate, an estimate of?

v. From the Analysis of Variance table, show the formula and the steps to compute the F statistic.  State the null and alternative hypothesis that is being tested with this F statistic.  Draw a picture of the F distribution and show the conclusion of the test on the picture, at the level of significance of 0.05.

vi. Develop the two t tests for each independent variable.  State clearly the null and alternative hypotheses.  Use a picture of the probability distribution with your analysis.  What is your conclusion (set the alpha = 0.01)? What does it mean to be statistically significant (or not significant) in the context of multiple regression?

 Exercise 2

i. Which assumptions are involved in arriving at the conclusion that the variance-covariance matrix for our estimated coefficients is equal to  ? Explain the assumptions in words and describe how each assumption relates back to this equation. Why is it important to remember these assumptions (i.e. what happens if they are not true)?


Exercise 3 


Use the Excel data file PS5_Data, estimate the following model:

internet_usei = β0 + β1(access)i + β2(GDP_capita)i  + β3(pop_density)i + β4(rural)i + ui

Where internet_use is the percentage of population using the Internet; access is access to electricity (% of population); pop_density is population density (people per sq. km of land area); gdp_capita is GDP per capita (constant 2010 thousand USD); and rural is rural population (% of total population).

The World Bank conducts an analysis of electricity markets and development and wants to test whether or not increasing electricity access is associated with its assumed 0.5 percentage point return in increased internet access, holding all other variables constant. Test H0:  against the two-sided alternative. Carry out the test with α = .01 and α = .05. What do you conclude?

Compute the F statistic for overall significance of the model. State the null and alternative hypothesis that is being tested with this F statistic.  

Now, test H0: and  in the model. What is the alternative hypothesis?  Carry out the test with α = .01 and α = .05. What do you conclude? Do population density and rural share as a group add extra explanation to the variability of internet use?

Generate an 90% confidence interval for  and a 99% confidence interval for . Explain in words what these confidence intervals mean.

Find the variance inflation factor (VIF) for the coefficient on rural share of the population.