Economics 368 Fall 2023 Empirical Exercise #2
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Economics 368
Fall 2023
Empirical Exercise #2
Due Date: To be submitted through Course Site before Tuesday, October 24th, at 12:10pm. Assignments handed in late will be penalized at a rate of 25% per day, including weekend days.
Format Please hand in a single document containing the relevant STATA output that has been annotated with your typed answers to the questions below (There is no need to re-format the STATA tables or estimation output; simply cut and paste them into your answer document). In addition, please hand in a copy of the STATA log file for your program. Both documents must be converted to .pdf format for submission (otherwise you will not be able to upload them to Course Site). While it is permissible to work with other students in the class on this assignment, you must write your own STATA code and answers.
Link to LUApps to access STATA: https://lts.lehigh.edu/luapps
General MEPS references
Documentation file: https://meps.ahrq.gov/data_stats/download_data/pufs/h181/h181doc.pdf Codebook: https://meps.ahrq.gov/data_stats/download_data/pufs/h181/h181cb.pdf
1) Distribution of medical expenditures
a) Using the summarize command with the detail option investigate the distribution of
total medical expenditures (totexpy_r). Be sure to account for the weights using [aw = ]. What do you observe about the distribution of total medical expenditures?
b) Now generate a new variable that is the log of total medical expenditures. Because you can’t take the log of a 0 value, before you create this new variable add $1 to every
person’s medical expenditures. Use summarize, detail in the same manner as part a) to analyze the log of total medical expenditures. How does the distribution of the log
values compare to the unlogged values? Use the statistics reported in the summarize command to support your conjecture.
c) Repeat part b) for individuals with diabetes. How does the distribution of log
expenditures in this case compare to the full population considered in part b)? Why? Use the statistics reported in the summarize command to support your conjecture.
2) Basic determinants of medical expenditures
a) In this section you will estimate a linear regression on medical expenditures. We will
estimate these models and those of subsequent sections on adults, so remove the
children from the sample by using the command, “keep if age>=19”. Now use the syvset command to set the weight, strata, and psu variables so that you can incorporate these features into your regression by specifying the regression model using svy: reg. Each
regression will include a basic set of sociodemographic control variables for gender, race/ethnicity, age, education, and family income. We will call this set of variables“control_set”, and can specify these variables using a global macro so that they can be referenced in regression commands by simply typing $control_set. Write this global macro as follows:
global control_set female black hispanic other_race age25_34 age35_44 age45_54 /* */ age55_64 age65_74 age75plhsdipl somecoll ba baplus l_fincpe
[Note that it the above command is typed directly into the command line the “/* */”
must be removed, but it is necessary to include this code in the batch program because the variable still over into two lines.]
b) Using the survey regression command estimate a linear regression of total medical
expenditures on the set of control variables. Next, change the dependent variable to the log of total medical expenditures and re-estimate the model. Compare the R2 and t-
statistics of the coefficients across the two models. What do you think explains the differences that you observe?
c) From now on, use the second model where the dependent variable is the log of total
medical expenditures. What is the proper interpretation of the coefficient on the
variable “somecoll” [Be specific]? In general, what is the relationship between education and medical expenditures?
d) What is the proper interpretation of the coefficient on the variable “hispanic”
(indicating Latinx)? What happens if you try to add the variable for “white” (indicating Caucasian) into the model? Why do you think this happens?
e) Go back to the model without the additional control variable “white” and add the
following control variables for health insurance, leaving private health insurance as the reference group: medicare, medicaid, unisured. Compare the coefficient for hispanic
from this model to the one without the control variables for health insurance. What do you think explains the difference? To help aid in your explanation use the corr command to estimate the simple correlation coefficient between hispanic and uninsured and
reference the coefficient on uninsured in the regression model.
f) The variable “l_fincpe” is the log of family income divided by the square root of
household size (Dividing by the square root of household size is intended to adjust for household economies of scale). What is the proper interpretation of the coefficient for this variable in the model specified in part e)? Again, be specific.
3) Medical spending and BMI
a) The variable “bmindx53” contains adult body mass index (only included through year 2016). Use the summarize, detail command to analyze this variable (with weights, of course). You will see that there are some negative values, indicating missing data. Use the replace command to set all the negative values equal to missing and then re-run summarize, detail. [Note that observations with missing BMI data will now be automatically dropped from commands]. Consider the mean value of BMI. What does the magnitude of the mean imply about the weight status of the average American?
b) Run a linear regression of the log of total medical expenditures on control_set, the control variables for health insurance used above, and the BMI variable. What is the proper interpretation of the coefficient on BMI in this model?
c) Create a variable for the square of BMI and add this variable to the regression in part b). What do the estimates on the BMI variables imply about the relationship between BMI and medical expenditures?
d) Now create an “interaction term” between the BMI variable and the log of income by multiplying these two variables together. Include this interaction term in the regression model that does not include the square of BMI. What does the coefficient on the
interaction term suggest about the relationship between BMI and medical
expenditures? Why do you think this is the case? [Hint: Write out the regression model and take the derivative of medical expenditures with respect to the BMI variable].
e) Re-estimate the model of the log of total medical expenditures on BMI, without
including the square of BMI or an interaction term. Now create a new variable
measuring the log of out-of-pocket medical expenditures (totslfy_r). Don’t forget to add $1 to each observation to account for the 0 value problem. Now re-estimate the model with the log of out-of-pocket medical expenditures as the dependent variable. Compare the BMI coefficient across the two models. What do you think explains the difference that you find in the marginal effect of BMI? [Hint: Think about the design of most insurance benefits].
2023-10-18