Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

ECON 326

Problem Set #1

Topic 1 – Bayesian Thinking (8 pts)

1. What are your current beliefs about the extent to which breastfeeding, instead of using infant formula, affects an average child’s later cognitive skills as measured by IQ points? Assume that the standard deviation of the IQ distribution is 10 points. Graph your prior distribution.

| | | | |

-20 -10 0 10 20

2. Now read this CNN article (http://www.cnn.com/2015/03/18/health/breastfeeding-iq-income/) about a recent study on breastfeeding. Graph your posterior distribution.

| | | | |

-20 -10 0 10 20

3. Explain how, if at all, this study changed your beliefs and why.

Topic 2 – Counterfactuals (8 pts)

In 2006, Massachusetts was the first state in the U.S. to implement universal health insurance coverage for its residents. We would like to estimate the causal impact of this reform on the health of Massachusetts residents.

1. Explain what the ideal counterfactual would be here. Why can’t this counterfactual be observed?

2. Describe three potential real-world counterfactuals that might help us estimate the impact of this reform. For each:

a. Describe the counterfactual clearly.

b. Explain one reason why you think that counterfactual is helpful.

c. Explain one potential problem with using that counterfactual.

Topic 3 – Internal and External Validity (10 pts)

Read the abstract and experiment discussion from this paper, “Ban the Box, Criminal Records, and Racial Discrimination: A Field Experiment”

https://academic.oup.com/qje/article/133/1/191/4060073

1. In a short paragraph, describe the study. Make sure you include in your description:

a. What is the key question the authors are trying to answer?

b. What is the treatment?

c. What is the outcome (dependent variable) of interest?

d. What would the ideal counterfactual have been?

e. What do the authors use to statistically approximate that ideal counterfactual.

2. As far as you can tell, does this study have high internal validity?

3. In an ideal setting, what would the study have randomized?

4. Explain a possible threat to internal validity in this study).

5. For what populations or contexts would this study have high external validity? Are there populations for which you think this study has low external validity? Explain clearly.

Topic 4 – Stata Exercise and t-test Review (10 pts)

Using the materials from our STATA labs, the gender.dta dataset, and any additional commands you need, create a do-file to give you the information necessary to answer the following questions.  You need to show your work in answering questions (1)-(2), i.e. write down the formulas you are using, and use the numbers from the Stata output to compute the relevant confidence interval or test statistic.

Important: For this and all future assignments, you will need to submit your Stata output (your do and your log files) with your answers. Directions for creating “do” and “log” files is included in the Stata labs. Also note you can type “help” in STATA for more information on using commands. For example, type “help ttest”.

Note: The answer you submit for this part should contain:

* Your “ps1.do” do-file

* A log file where the results (i.e. output) produced by “ps1.do” are displayed.

* Your complete write-up of answers to questions 1-3

1. Test the null hypothesis that the average hourly wage in the population is equal to $12/hour using a 5% significance level. In doing so, indicate:

a. The null hypothesis

b. The test or t- statistic

c. The p-value of this test and your interpretation of the p-value

d. The 95% confidence interval for the hourly wage. 

2. What is the difference in the average hourly wage between men and women? Is this difference statistically significant at the 5% significance level? In doing so, conduct a hypothesis test and indicate:

a. The null hypothesis

b. The test or t-statistic

c. The p-value of this test and your interpretation of the p-value

3. Does the result of the last question provide evidence that there is gender discrimination in wages?  Explain briefly.

Topic 5 – Bivariate Regression (15 pts)

Use the gender.dta data set again for this exercise.

1. Type “scatter wage educ” to get a visual representation of the relationship between hourly wages and years of education. Show this figure.

2. Based on this figure, what can you conclude about the relationship between wages and education?

3. Type “regress wage educ” and record the output from that command.

a) Write down the sample regression function (SRF) being estimated here.

b) What is the value of , the estimated slope coefficient? Interpret this value in words. Does this match your finding from question 2?

c) Is  statistically significant? Explain.

d) Does this imply that a policy designed to increase educational attainment will increase wages? Explain.

e) What is the value of ? Interpret this value in words. Is this practically meaningful?

f) What is the predicted wage for a person with 12 years of education? Show your calculation.

4. Calculate the predicted y-values from the regression using “predict yhat”.

a) Type “sum yhat” to see the mean of the predicted wage values. Have you seen this previously in the problem set? Explain.

b) Type “sum yhat if educ==12”. Have you seen this mean value previously? Explain.

5. Calculate the sample residuals from the regression using “predict uhat, resid”.

a) Type “sum uhat”. What is the largest residual in the dataset? What could you say about the person who has this residual?  

b) Type “regress uhat educ”. What do you find and why does it make sense?

Topic 6 – Multivariate Regression and Dummy Variables (20 pts)

In this part, you will again explore the determinants of income using data from the National Longitudinal Survey of Youth sample. Download the file nlsy.dta from Canvas, which contains a sub-sample of 1,000 income-earning respondents to a survey of American youths in 2,000.  For simplicity, respondents are classified as white, black or Hispanic.

1. Present a nicely formatted table of summary statistics for each variable, including the mean, standard deviation, minimum and maximum. In a brief paragraph, describe the characteristics of the people in the sample. The goal of this is to provide background for a reader before you present any additional analysis.

2. Regress income on “educ” and “married”. (For this and all other regressions, use the “robust” option.) Interpret each coefficient and discuss whether each is statistically significantly different from zero. Also interpret the constant and discuss whether it is meaningful.

3. Predict the monthly income for two people, an unmarried high school dropout with 10 years of schooling and a married college graduate with 16 years of schooling.

4. Regress income on three dummy variables for being white, black and Hispanic. Explain why Stata reacts the way it does.

5. Regress income on dummy variables for being black and Hispanic. Interpret the two coefficients and discuss whether they are statistically significant.

6. What regression could you run to test whether black and Hispanic respondents’ incomes were statistically significantly different from each other? Run that regression and answer that question.

7. Make a well-formatted regression table with four columns. In column 1, regress “income” on “educ”. In column 2, add “married” as an explanatory variable. In column 3, add “black” and “hispanic” as explanatory variables. In column 4, add “age” as an explanatory variable. Label statistically significant coefficients with stars.

8. We are interested in the impact of education on income. According to the table you just made, does omitting marital status generate substantial omitted variable bias when estimating this relationship? Does omitting race/ethnicity? Does omitting age?

9. You are advising a senator who sees your results and concludes that, since married people have higher income than unmarried people, it would be good policy to promote marriage. Write a short paragraph critiquing the senator’s statistical logic.