Econ 390 Using STATA Part 4 Regression Analysis
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Econ 390
Using STATA Part 4
Regression Analysis
This worksheet builds on our previous work with STATA. Today we will try working with a real data set. You are encouraged to work together.
New STATA Commands:
1) reg
2) xi: reg
1. Download the data addwave4.dta from the course website.
The data is in the “lab” folder on Canvas. It should be saved to a directory that you can remember – such as c:\Users\Arts User\Desktop
2. Look at the codebook for the data set.
The codebook is called add4codebook.pdf and is in the “lab” folder on Canvas.
3. Using the codebook, write a ‘.do’ file that performs the following tasks.
a) Start a log.
b) Load the data.
c) Keep only the variables H4EC13 H4IR4 BIO_SEX4 H4ED2 H4WGT H4HGT
d) Use the ‘describe’ command to examine the variables. Make sure they are not “str2”, which means they are string (text) variables.
e) Use the ‘summ’ command to examine the summary statistics of the variables.
What is the sample size?
Take a look at the maximum value of H4HGT, which is the height of the person. Does it make sense?
Other variables have the same problem. Refer to the codebook to find out.
You will have to remove those “missing values = the person did not answer the question” by yourself.
f) Drop observations (people) that answered “wrongly” or “refused” or “legitimate skip” or “invalid data” for H4HGT.
g) Do the same thing for the other variables when the person did not answer the question.
h) use the summ command to examine the summary statistics of the variables again.
What is the sample size now?
* Note: your sample size will be smaller than the original sample because you dropped the missing values.
i) Rename the variable H4HGT ‘height’.
j) Rename the variable H4WGT ‘weight’.
k) Create a variable ‘degree’ for having completed a bachelor’s degree or higher using the variable H4ED2
l) Make a variable for gender using the variable BIO_SEX4 (make the new variable ‘male’ equal to 1 if BIO_SEX4 indicates the person is a male, 0 otherwise).
m) Make a variable for race using the variable H4IR4 (make the new variable ‘white’ equal to 1 if H4IR4 indicates the person is white, 0 otherwise).
n) Make a variable for bill using the variable H4EC13 (make the new variable ‘bill’ equal to 1 if H4EC13 indicates the person did not pay some bills before, 0 otherwise).
o) Save the new dataset under a different name on the desktop. Why?
4. Find some statistics
Now that you have created your dataset it is time to do some analysis with it. Try adding to your do file some commands that will perform the following tasks:
a) Find the summary statistics for height.
b) Now try using the ‘detail’ option to find even more summary statistics
summ height, detail
c) Make a table with the mean of height by gender.
5. OLS regressions
We are now ready to try some regression analysis. The command for OLS regressions in STATA is quite easy:
reg y x
where ‘y’ is the dependent variable and ‘x’ is the independent variable.
a) Try running a regression of bill on height.
i. What is the coefficient on height?
ii. Is the coefficient statistically significant?
iii. How is it interpreted?
iv. What is the R2?
b) Try running a regression of bill on height without the constant:
reg y x, noconstant
i. What is the R2? Is this model better than the one in a)?
ii. Is the constant necessary in this case?
c) Now run the regression of bill on height and degree and white without the constant.
i. What is the coefficient on height? Has it changed from the first regression? Why?
ii. Interpret the coefficient on ‘degree’ in a sentence.
iii. What is the R2 for the model again? How does it compare with the R2 in a)?
d) Now run the regression of bill on height and degree without the constant by restricting the sample to white only.
reg y x if abc == something, noconstant
i. What is the coefficient on height? Has it changed from the previous regression? Why?
ii. Interpret the coefficient on ‘degree’ in a sentence.
6. OLS regressions with dummy variables
When one of the independent variables is categorical, STATA can automatically make a set of dummy variables. We use the ‘xi’ command to do this. Here is an example
xi: reg bill height degree i.H4IR4, noconstant
Remeber the original variable of race (H4IR4) contains 4 different races.
This regression will now include on the right-hand side a set of dummy variables for all of the races. STATA is smart enough to leave one of the dummies out of the regression in order to avoid the dummy variable trap.
Try running a ‘xi’ regression to see how it works.
How many dummy variables has STATA created for race (H4IR4)?
How do you interpret the coefficients on the dummy variables of race?
Some frequently used STATA commands:
Command |
Use |
cd c:\ |
Changes directory to c:\ |
clear |
Clears the memory so that a new dataset can be loaded. |
dir |
Displays contents of current directory |
count |
Provides a count of the number of observations |
desc |
Shows variables currently in memory with description |
summ |
Shows means and standard deviations for all variables - can use ',detail' to get percentiles |
reg y x |
Runs OLS using y as dependent variable and x as independent variable |
gen |
Creates new variable |
replace |
Replaces old value with new value |
drop x |
Drops variable x |
keep x |
Drops all variables except for x |
drop if x==0 |
Drops all observations with x=0 |
keep if x==0 |
Keeps only those observations with x=0 |
dprobit |
Runs a probit regression, reports marginal probabilities |
use data.dta |
Brings dataset data.dta into memory |
save data.dta |
Saves dataset data.dta to disk (add ', replace') to overwrite existing data.dta |
table x |
Shows a table with the frequency distribution for variable x |
table x, c(mean y) |
Shows the mean of y for each value of x |
compress |
Compresses the dataset to take up the minimum amount of memory possible |
insheet x y using data.dat, t |
Brings variables x and y from ascii file data.dat (tab delimited) into memory |
log using econ390.log, t |
Creates a log file called econ390.log, text format |
log close |
Closes any open log file |
capture log close |
Closes any open log file, but doesn’t crash if there is no open log file |
real (x) |
Transfers variable x from text to numeric form. e.g. gen numbvar = real(textvar) |
2022-10-26