闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

ASSIGNMENT 1

(Each part carries equal marks. Sub-question carries equal marks.)

The National Health Interview Survey (NHIS) is an annual survey conducted by the Department of Health and Human Services to measure the state of health in the country. The National Death Index is a master database of everyone that has died in the US including their Social Security number. Using detailed identifiable information about respondents to the NHIS, starting in the 1980s, respondents to the NHIS were matched to the National Death Index and the file then included a variable that identified when and if a person died within a follow-up period. The data also includes the cause of death. This merged data is called the NHIS Multiple Cause of Death data (MCOD). I have constructed a data set that has respondents aged 25-64 from the NHIS/MCOD data. Below is a table that describes the variables in the data set.

Variables	Description
diedin5	Dummy variable, =1 if the respondent died within 5 years of the survey, =0 otherwise
Male	Dummy variable, =1 if the respondent is male, =0 otherwise
Age	Age in years
Married	Dummy variable, =1 if the respondent is married, =0 otherwise
Race	Categorical variable for race and ethnicity. =1 if respondent is white, non-Hispanic, =2 if black, non-Hispanic, =3 if other race, non-Hispanic, =4 if Hispanic
Educ	Categorical variable for educational level. =1 if respondent has less than a high school degree, =2 if a high school degree, =3 if some college, =4 with a bachelor’s degree or more
Incomeg	Categorical variable for family income. =1 if family income is ≤$10K, =2 if >$10K and ≤$20K, =3 if >$20K and ≤$30K, =4 if >$30K and ≤$40K, =5 if >$40K and ≤$50K, =6 if >$50K.
Bmi	Body mass index, weight in kg/ height in cm squared
Srhealth	Categorical variable for self-reported health status. =1 if excellent health, =2 if very good, =3 if good, =4 if fair =5 if poor.

1) Start with some data description

Create a unique dataset. Drop parts of data according to the last four digits of your student number (i.e., your student ID is 40271234, please drop the first 1234 observations; if your student ID of 40003999, please drop the first 3999 observations, etc.).

i. Browse the dataset. What is the data type?

ii. “Summarize”the data and provide comments about the descriptive statistics.

iii. Check the distributions of variables. Are they normal distributions? Why? Are there signs of possible outliers?

iv. Some of the variables are categorical variables. Use the “tab” command to have an idea of what fraction of the sample died in the 5 years after the survey? Comment on the distribution of diedin5 (one/two lines).

v. Use again the “tab” command to calculate the fraction of “diedin5” by education (”educ”) first and then by income (incomeg) groups (this is a cross-tabulation). Provide some intuition (one/two lines) on the distribution of mortality rate by

education and income (i.e., how does mortality change with income and education?).

vi. Compare mortality rates across racial and ethnic groups. Compare the diedin5 rate for those who are white, blacks and Hispanic. Comment on the comparison result. (one/two lines)

vii. Use the “tab” command together with the “sum” command to get the means of “diedin5” jointly by educ and incomeg. Use these results to fill in Table 1.

viii. Provide an example (one/two lines) of what the numbers in Table 1 mean (for

example, what the number in the column/row 1/1 means?).

Table 1. Means of diedin5 jointly by education and income

	Income
Education		1	2	3	4	5	6
	1
	2
	3
	4

2) Regression I

Race, income, and education are categorical variables. Construct dummy variables for race groups 2-4 (i.e. race2, race3 and race4), income groups 2-6 (i.e. income2, income3, income4, income5 and income6) and education groups 2-4 (i.e. education2, education3 and education4). Remember this can be done with a single command.

Next, run a regression of diedin5 on age, male, married, plus constructed dummies for race, income, and education.

i. Interpret the coefficient on age. As a person ages 10 years, what happens to five year mortality rates?

ii. Generate the log of age. Run again the regression above but replacing age with the log of age. Interpret again the effect of age.

iii. Interpret the coefficient on married. Does this coefficient make sense to you? Why?

iv. Look at the coefficients on the income dummy variables. Interpret the coefficient on the dummy for income group 6 (i.e., what the coefficient means? how large is the effect?)

v. Why is a good idea to replace categorical variables like race, education, income, with dummies?

vi. At the 5% significance level, can you reject or not reject the null hypothesis that the coefficient on married is equal to zero? Why? How the t-statistics is obtained?

3) Regression II

Construct a set of dummy variables for the following four variables:

. Underweight, if BMI<=19

. Overweight, if BMI>25 and BMI<=30

. Obese, if BMI>30 and BMI<=35

. Severely obese, if BMI>35.

Add the four dummy variables above to the previous regression you used in Part 2 (ii).

i. Why you have not included a dummy for 19

ii. Interpret the coefficient on overweight. Does this coefficient make sense to you? Why?

iii. Why is the coefficient on underweight such a large positive number?

iv. Run a test of joint significance on dummies generated above using BMI to test whether they are jointly significant. What is the null? Do you reject the null? Why?

v. Run a test of linear restrictions and test that the effect of married is equal to the effect of male. What is the null? Do you reject the null? Why?

4) Test for heteroscedasticity

i. Use the model estimated in Part 3 to test for heteroscedasticity. What is the null hypothesis? Do you reject the null? What are issues related to heteroscedasticity? What

are the possible tests you can use?

ii. Run again the regression above controlling for heteroscedasticity. Can you see any difference? Why?

5) The Gauss-Markov Theorem

i. Illustrate the Gauss-Markov Theorem. What are the assumptions made? What desirable properties the OLS estimator will then have? What each property means for you?

ii. Are the assumptions always holding? If they are violated, what issues may occur? How you will handle these violations?

2023-11-23

Java

物理(Physical)

LINUX

C++

Python

Processing

sas

ios

maths

maple