Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Econ 390

Using STATA Part 4

Regression Analysis

This worksheet builds on our previous work with STATA. Today we will try working with a real data set. You are encouraged to work together.

New STATA Commands:

1) reg

2) xi: reg

1. Download the data addwave4.dta from the course website.

The data is in the “lab” folder on Canvas. It should be saved to a directory that you can remember – such as c:\Users\Arts User\Desktop

2. Look at the codebook for the data set.

The codebook is called add4codebook.pdf and is in the “lab” folder on Canvas.

3. Using the codebook, write a ‘.do’ file that performs the following tasks.

a) Start a log.

b) Load the data.

c) Keep only the variables H4EC13 H4IR4 BIO_SEX4 H4ED2 H4WGT H4HGT

d) Use the ‘describe’ command to examine the variables. Make sure they are not “str2”, which means they are string (text) variables.

e) Use the ‘summ’ command to examine the summary statistics of the variables.

What is the sample size?

Take a look at the maximum value of H4HGT, which is the height of the person. Does it make sense?

Other variables have the same problem. Refer to the codebook to find out.

You will have to remove those “missing values = the person did not answer the question” by yourself.

f) Drop observations (people) that answered “wrongly” or “refused” or “legitimate skip” or “invalid data” for H4HGT.

g) Do the same thing for the other variables when the person did not answer the question.

h) use the summ command to examine the summary statistics of the variables again.

What is the sample size now?

* Note: your sample size will be smaller than the original sample because you dropped the missing values.

i) Rename the variable H4HGT ‘height’.

j) Rename the variable H4WGT ‘weight’.

k) Create a variable ‘degree’ for having completed a bachelor’s degree or higher using the variable H4ED2

l) Make a variable for gender using the variable BIO_SEX4 (make the new variable ‘male’ equal to 1 if BIO_SEX4 indicates the person is a male, 0 otherwise).

m) Make a variable for race using the variable H4IR4 (make the new variable ‘white’ equal to 1 if H4IR4 indicates the person is white, 0 otherwise).

n) Make a variable for bill using the variable H4EC13 (make the new variable ‘bill’ equal to 1 if H4EC13 indicates the person did not pay some bills before, 0 otherwise).

o) Save the new dataset under a different name on the desktop. Why?

4. Find some statistics

Now that you have created your dataset it is time to do some analysis with it. Try adding to your do file some commands that will perform the following tasks:

a) Find the summary statistics for height.

b) Now try using the ‘detail’ option to find even more summary statistics

summ height, detail

c) Make a table with the mean of height by gender.

5. OLS regressions

We are now ready to try some regression analysis. The command for OLS regressions in STATA is quite easy:

reg y x

where ‘y’ is the dependent variable and ‘x’ is the independent variable.

a) Try running a regression of bill on height.

i. What is the coefficient on height?

ii. Is the coefficient statistically significant?

iii. How is it interpreted?

iv. What is the R2?

b) Try running a regression of bill on height without the constant:

reg y x, noconstant

i. What is the R2? Is this model better than the one in a)?

ii. Is the constant necessary in this case?

c) Now run the regression of bill on height and degree and white without the constant.

i. What is the coefficient on height? Has it changed from the first regression? Why?

ii. Interpret the coefficient on ‘degree’ in a sentence.

iii. What is the R2 for the model again? How does it compare with the R2 in a)?

d) Now run the regression of bill on height and degree without the constant by restricting the sample to white only.

reg y x if abc == something, noconstant

i. What is the coefficient on height? Has it changed from the previous regression? Why?

ii. Interpret the coefficient on ‘degree’ in a sentence.

6. OLS regressions with dummy variables

When one of the independent variables is categorical, STATA can automatically make a set of dummy variables. We use the ‘xi’ command to do this. Here is an example

xi: reg bill height degree i.H4IR4, noconstant

Remeber the original variable of race (H4IR4) contains 4 different races.

This regression will now include on the right-hand side a set of dummy variables for all of the races. STATA is smart enough to leave one of the dummies out of the regression in order to avoid the dummy variable trap.

Try running a ‘xi’ regression to see how it works.

How many dummy variables has STATA created for race (H4IR4)?

How do you interpret the coefficients on the dummy variables of race?

Some frequently used STATA commands:

Command

Use

cd c:\

Changes directory to c:\

clear

Clears the memory so that a new dataset can be loaded.

dir

Displays contents of current directory

count

Provides a count of the number of observations

desc

Shows variables currently in memory with description

summ

Shows means and standard deviations for all variables - can use ',detail' to get percentiles

reg y x

Runs OLS using y as dependent variable and x as independent variable

gen

Creates new variable

replace

Replaces old value with new value

drop x

Drops variable x

keep x

Drops all variables except for x

drop if x==0

Drops all observations with x=0

keep if x==0

Keeps only those observations with x=0

dprobit

Runs a probit regression, reports marginal probabilities

use data.dta

Brings dataset data.dta into memory

save data.dta

Saves dataset data.dta to disk (add ', replace') to overwrite existing data.dta

table x

Shows a table with the frequency distribution for variable x

table x, c(mean y)

Shows the mean of y for each value of x

compress

Compresses the dataset to take up the minimum amount of memory possible

insheet x y using data.dat, t

Brings variables x and y from ascii file data.dat (tab delimited) into memory

log using econ390.log, t

Creates a log file called econ390.log, text format

log close

Closes any open log file

capture log close

Closes any open log file, but doesn’t crash if there is no open log file

real (x)

Transfers variable x from text to numeric form. e.g. gen numbvar = real(textvar)