闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Econ 390

Using STATA Part 4

Regression Analysis

This worksheet builds on our previous work with STATA. Today we will try working with a real data set. You are encouraged to work together.

New STATA Commands:

1) reg

2) xi: reg

1. Download the data addwave4.dta from the course website.

The data is in the “lab” folder on Canvas. It should be saved to a directory that you can remember – such as c:\Users\Arts User\Desktop

2. Look at the codebook for the data set.

The codebook is called add4codebook.pdf and is in the “lab” folder on Canvas.

3. Using the codebook, write a ‘.do’ file that performs the following tasks.

a) Start a log.

b) Load the data.

c) Keep only the variables H4EC13 H4IR4 BIO_SEX4 H4ED2 H4WGT H4HGT

d) Use the ‘describe’ command to examine the variables. Make sure they are not “str2”, which means they are string (text) variables.

e) Use the ‘summ’ command to examine the summary statistics of the variables.

What is the sample size?

Take a look at the maximum value of H4HGT, which is the height of the person. Does it make sense?

Other variables have the same problem. Refer to the codebook to find out.

You will have to remove those “missing values = the person did not answer the question” by yourself.

f) Drop observations (people) that answered “wrongly” or “refused” or “legitimate skip” or “invalid data” for H4HGT.

g) Do the same thing for the other variables when the person did not answer the question.

h) use the summ command to examine the summary statistics of the variables again.

What is the sample size now?

* Note: your sample size will be smaller than the original sample because you dropped the missing values.

i) Rename the variable H4HGT ‘height’.

j) Rename the variable H4WGT ‘weight’.

k) Create a variable ‘degree’ for having completed a bachelor’s degree or higher using the variable H4ED2

l) Make a variable for gender using the variable BIO_SEX4 (make the new variable ‘male’ equal to 1 if BIO_SEX4 indicates the person is a male, 0 otherwise).

m) Make a variable for race using the variable H4IR4 (make the new variable ‘white’ equal to 1 if H4IR4 indicates the person is white, 0 otherwise).

n) Make a variable for bill using the variable H4EC13 (make the new variable ‘bill’ equal to 1 if H4EC13 indicates the person did not pay some bills before, 0 otherwise).

o) Save the new dataset under a different name on the desktop. Why?

4. Find some statistics

Now that you have created your dataset it is time to do some analysis with it. Try adding to your do file some commands that will perform the following tasks:

a) Find the summary statistics for height.

b) Now try using the ‘detail’ option to find even more summary statistics

summ height, detail

c) Make a table with the mean of height by gender.

5. OLS regressions

We are now ready to try some regression analysis. The command for OLS regressions in STATA is quite easy:

reg y x

where ‘y’ is the dependent variable and ‘x’ is the independent variable.

a) Try running a regression of bill on height.

i. What is the coefficient on height?

ii. Is the coefficient statistically significant?

iii. How is it interpreted?

iv. What is the R²?

b) Try running a regression of bill on height without the constant:

reg y x, noconstant

i. What is the R²? Is this model better than the one in a)?

ii. Is the constant necessary in this case?

c) Now run the regression of bill on height and degree and white without the constant.

i. What is the coefficient on height? Has it changed from the first regression? Why?

ii. Interpret the coefficient on ‘degree’ in a sentence.

iii. What is the R²for the model again? How does it compare with the R² in a)?

d) Now run the regression of bill on height and degree without the constant by restricting the sample to white only.

reg y x if abc == something, noconstant

i. What is the coefficient on height? Has it changed from the previous regression? Why?

ii. Interpret the coefficient on ‘degree’ in a sentence.

6. OLS regressions with dummy variables

When one of the independent variables is categorical, STATA can automatically make a set of dummy variables. We use the ‘xi’ command to do this. Here is an example

xi: reg bill height degree i.H4IR4, noconstant

Remeber the original variable of race (H4IR4) contains 4 different races.

This regression will now include on the right-hand side a set of dummy variables for all of the races. STATA is smart enough to leave one of the dummies out of the regression in order to avoid the dummy variable trap.

Try running a ‘xi’ regression to see how it works.

How many dummy variables has STATA created for race (H4IR4)?

How do you interpret the coefficients on the dummy variables of race?

Some frequently used STATA commands:

Command	Use
cd c:\	Changes directory to c:\
clear	Clears the memory so that a new dataset can be loaded.
dir	Displays contents of current directory
count	Provides a count of the number of observations
desc	Shows variables currently in memory with description
summ	Shows means and standard deviations for all variables - can use ',detail' to get percentiles
reg y x	Runs OLS using y as dependent variable and x as independent variable
gen	Creates new variable
replace	Replaces old value with new value
drop x	Drops variable x
keep x	Drops all variables except for x
drop if x==0	Drops all observations with x=0
keep if x==0	Keeps only those observations with x=0
dprobit	Runs a probit regression, reports marginal probabilities
use data.dta	Brings dataset data.dta into memory
save data.dta	Saves dataset data.dta to disk (add ', replace') to overwrite existing data.dta
table x	Shows a table with the frequency distribution for variable x
table x, c(mean y)	Shows the mean of y for each value of x
compress	Compresses the dataset to take up the minimum amount of memory possible
insheet x y using data.dat, t	Brings variables x and y from ascii file data.dat (tab delimited) into memory
log using econ390.log, t	Creates a log file called econ390.log, text format
log close	Closes any open log file
capture log close	Closes any open log file, but doesn’t crash if there is no open log file
real (x)	Transfers variable x from text to numeric form. e.g. gen numbvar = real(textvar)