Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Econ 390

Using STATA Part 1

Data Input

We will be using the STATA statistics package for this course. We will begin with inputting a dataset into STATA.

New STATA Commands:

1) clear

2) use

3) infix

4) save

5) summ

6) rename

7) keep

Note: STATA commands are case sensitive. I capitalized commands and variables (e.g. CLEAR, DVDES25...) in the notes because I want to highlight them. The variable names should be entered (could be either uppercase or lowercase, check the codebook) as in the original data.

1. Saving the datasets into your computer

The dataset (econ390.dta) is put under “lab” folder on Canvas.

To save it on the desktop of the lab computer, move your mouse cursor to the file name on Canvas and right click. You will see a new menu, choose “Save target as”.

Click “Desktop” on the left hand side of the next new menu, then click “Save” on the lower right corner of the same menu.

The file will then be saved in the following directory location:

c:\Users\Arts User\Desktop

2. Clear STATA memory

Always begin your work with this STATA command:

clear

It clears the previous dataset from the memory. If not, you will get an error message.

3. Inputting your data into STATA for Data File with .dta

If your data file ends with extension .dta, they are STATA ready dataset. To input your data into STATA, the STATA command is USE. Just type in

use "c:\Users\Arts User\Desktop\econ390.dta"

Add Health dataset is with .dta extension.

4. Find the summary statistics of the dataset.

Once you have input the data into STATA, you can use the SUMM command to see what variables you have in the dataset and the summary statistics of the variables.

summ

You can see there are lots of variables you don’t need in the original data. You will learn how to keep only the variables you need in step 6.

5. Rename variables.

Usually the variables in the original dataset have names that are hard to recognize what they are: for example, DVDES25. You can rename those variables by using the RENAME command:

rename mtr tax

The above command rename variable mtr in the data to tax. Try the summ command to look at the variables in the data.

6. Keeping variables.

Usually the original dataset contains many variables that you don’t need for your study.

For example, the econ390.dta dataset contains 6 variables. Let’s say you just want to use 5 variables in your study: faminc, mard, age, rrsp and tax. You can get just keep the 5 variables you need in the data (and drop the rest) by using the KEEP command:

keep faminc mard age rrsp tax

Try the summ command again and you will see only those 5 variables remain in the dataset.

7. Saving datasets

You should keep the original dataset and create your own working dataset by using the SAVE command:

save "c:\Users\Arts User\Desktop\econ390test.dta"

So you can just work with the working dataset econ390test.dta and keep your original data untouched.

Codebook for Econ 390.dta:

The variables are derived from the 1984 version of the Family Expenditure Survey conducted by Statistics Canada.

Variable

Description

FAMINC

Family income in 1984 dollars.

MARD

Marital status. Takes the value 1 if married or in common-law relationship; 0 otherwise.

AGE

The age of the head of the household.

RRSP

The dollar value of Registered Retirement Savings Plan contributions in 1984, in 1984 dollars.

MTR

Marginal tax rate. The combined (federal/provincial) rate of income tax payable on the last dollar of the head of household’s income.

REGION

Region of residence. Takes the value 1 for Atlantic provinces; 2 for Quebec; 3 for Ontario; 4 for Prairie provinces; 5 for BC.

Some frequently used STATA commands:

Command

Use

cd c:\

Changes directory to c:\

clear

Clears the memory so that a new dataset can be loaded.

dir

Displays contents of current directory

count

Provides a count of the number of observations

desc

Shows variables currently in memory with description

summ

Shows means and standard deviations for all variables - can use ',detail' to get percentiles

reg y x

Runs OLS using y as dependent variable and x as independent variable

gen

Creates new variable

replace

Replaces old value with new value

drop x

Drops variable x

keep x

Drops all variables except for x

drop if x==0

Drops all observations with x=0

keep if x==0

Keeps only those observations with x=0

dprobit

Runs a probit regression, reports marginal probabilities

use data.dta

Brings dataset data.dta into memory

save data.dta

Saves dataset data.dta to disk (add ', replace') to overwrite existing data.dta

table x

Shows a table with the frequency distribution for variable x

table x, c(mean y)

Shows the mean of y for each value of x

compress

Compresses the dataset to take up the minimum amount of memory possible

insheet x y using data.dat, t

Brings variables x and y from ascii file data.dat (tab delimited) into memory

log using econ390.log, t

Creates a log file called econ390.log, text format

log close

Closes any open log file

capture log close

Closes any open log file, but doesn’t crash if there is no open log file

real (x)

Transfers variable x from text to numeric form. e.g. gen numbvar = real(textvar)