Econ 390 Using STATA Part 1 Data Input
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Econ 390
Using STATA Part 1
Data Input
We will be using the STATA statistics package for this course. We will begin with inputting a dataset into STATA.
New STATA Commands:
1) clear
2) use
3) infix
4) save
5) summ
6) rename
7) keep
Note: STATA commands are case sensitive. I capitalized commands and variables (e.g. CLEAR, DVDES25...) in the notes because I want to highlight them. The variable names should be entered (could be either uppercase or lowercase, check the codebook) as in the original data.
1. Saving the datasets into your computer
The dataset (econ390.dta) is put under “lab” folder on Canvas.
To save it on the desktop of the lab computer, move your mouse cursor to the file name on Canvas and right click. You will see a new menu, choose “Save target as”.
Click “Desktop” on the left hand side of the next new menu, then click “Save” on the lower right corner of the same menu.
The file will then be saved in the following directory location:
c:\Users\Arts User\Desktop
2. Clear STATA memory
Always begin your work with this STATA command:
clear
It clears the previous dataset from the memory. If not, you will get an error message.
3. Inputting your data into STATA for Data File with .dta
If your data file ends with extension .dta, they are STATA ready dataset. To input your data into STATA, the STATA command is USE. Just type in
use "c:\Users\Arts User\Desktop\econ390.dta"
Add Health dataset is with .dta extension.
4. Find the summary statistics of the dataset.
Once you have input the data into STATA, you can use the SUMM command to see what variables you have in the dataset and the summary statistics of the variables.
summ
You can see there are lots of variables you don’t need in the original data. You will learn how to keep only the variables you need in step 6.
5. Rename variables.
Usually the variables in the original dataset have names that are hard to recognize what they are: for example, DVDES25. You can rename those variables by using the RENAME command:
rename mtr tax
The above command rename variable mtr in the data to tax. Try the summ command to look at the variables in the data.
6. Keeping variables.
Usually the original dataset contains many variables that you don’t need for your study.
For example, the econ390.dta dataset contains 6 variables. Let’s say you just want to use 5 variables in your study: faminc, mard, age, rrsp and tax. You can get just keep the 5 variables you need in the data (and drop the rest) by using the KEEP command:
keep faminc mard age rrsp tax
Try the summ command again and you will see only those 5 variables remain in the dataset.
7. Saving datasets
You should keep the original dataset and create your own working dataset by using the SAVE command:
save "c:\Users\Arts User\Desktop\econ390test.dta"
So you can just work with the working dataset econ390test.dta and keep your original data untouched.
Codebook for Econ 390.dta:
The variables are derived from the 1984 version of the Family Expenditure Survey conducted by Statistics Canada.
Variable |
Description |
FAMINC |
Family income in 1984 dollars. |
MARD |
Marital status. Takes the value 1 if married or in common-law relationship; 0 otherwise. |
AGE |
The age of the head of the household. |
RRSP |
The dollar value of Registered Retirement Savings Plan contributions in 1984, in 1984 dollars. |
MTR |
Marginal tax rate. The combined (federal/provincial) rate of income tax payable on the last dollar of the head of household’s income. |
REGION |
Region of residence. Takes the value 1 for Atlantic provinces; 2 for Quebec; 3 for Ontario; 4 for Prairie provinces; 5 for BC. |
Some frequently used STATA commands:
Command |
Use |
cd c:\ |
Changes directory to c:\ |
clear |
Clears the memory so that a new dataset can be loaded. |
dir |
Displays contents of current directory |
count |
Provides a count of the number of observations |
desc |
Shows variables currently in memory with description |
summ |
Shows means and standard deviations for all variables - can use ',detail' to get percentiles |
reg y x |
Runs OLS using y as dependent variable and x as independent variable |
gen |
Creates new variable |
replace |
Replaces old value with new value |
drop x |
Drops variable x |
keep x |
Drops all variables except for x |
drop if x==0 |
Drops all observations with x=0 |
keep if x==0 |
Keeps only those observations with x=0 |
dprobit |
Runs a probit regression, reports marginal probabilities |
use data.dta |
Brings dataset data.dta into memory |
save data.dta |
Saves dataset data.dta to disk (add ', replace') to overwrite existing data.dta |
table x |
Shows a table with the frequency distribution for variable x |
table x, c(mean y) |
Shows the mean of y for each value of x |
compress |
Compresses the dataset to take up the minimum amount of memory possible |
insheet x y using data.dat, t |
Brings variables x and y from ascii file data.dat (tab delimited) into memory |
log using econ390.log, t |
Creates a log file called econ390.log, text format |
log close |
Closes any open log file |
capture log close |
Closes any open log file, but doesn’t crash if there is no open log file |
real (x) |
Transfers variable x from text to numeric form. e.g. gen numbvar = real(textvar) |
2022-10-26