闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Econ 390

Using STATA Part 2

Data Manipulation

New STATA Commands:

1) list

2) desc

3) table

4) gen

5) drop

6) if

Note: STATA commands are case sensitive. I capitalized commands and variables (e.g. CLEAR, DVDES25...) in the notes because I want to highlight them. The variable names should be entered (could be either uppercase or lowercase, check the codebook) as in the original data.

1. Download the data econ390.dta from the course website.

The data is put under “lab” folder on Canvas. Try to save it on the desktop of the lab computer. Directory location is c:\Users\Arts User\Desktop

2. Open Stata.

Click on the Stata icon, or find STATA in the program menu.

3. Load the dataset into STATA.

This requires the USE command.

use "c:\Users\Arts User\Desktop\econ390.dta"

Error message? Don’t forget to CLEAR the previous dataset from the memory.

4. Saving datasets

To save a dataset, use the SAVE command.

save "c:\Users\Arts User\Desktop\econ390test.dta"

Now try saving it again.

save "c:\Users\Arts User\Desktop\econ390test.dta"

What happened? The problem is that a dataset already exists. For this reason, you must use the ‘replace’ option, which tells STATA to replace the existing version.

save "c:\Users\Arts User\Desktop\econ390test.dta", replace

Now try the USE command to get back to the original dataset.

use "c:\Users\Arts User\Desktop\econ390.dta"

What happened? Since there is already an active dataset, you must first clear the memory using a CLEAR statement.

5. Examine the data directly.

Use the LIST command. Try listing just some of the variables rather than all of them.

6. Find the summary statistics of the dataset.

a) Use the SUM command.

b) Now try the DESC command. What’s the difference?

DESC command in STATA will tell you whether the variable is str (string = categorical) or not.

Regression and many other things in STATA can only be done with numercial variables. If the variable is categorical, you will get an error message from STATA indicating your command cannot be complete.

Suppose a variable that you want to use in your regression is originally categorical in the data. To convert categorical variables to numercial variables, you would need to generate a new variable from the categorical variable. The command to generate a new variable is GEN, it will be discussed in step 8.

The new one will be numercial, but the old one will still be categorical. You will use your newly created numercial variable when you run your commands.

7. Making tables.

a) Try making a table for the variable REGION.

table region

b) Now try finding the mean RRSP contribution for each region. This requires specifying a statistic for the table. Try the command:

table region, c(mean rrsp)

The ‘c’ tells STATA the ‘contents’ of the table.

c) You can choose other statistics than just the mean. Use the ‘help’ feature of STATA to find out how to make a table with the median of rrsp by region.

d) You can also make two-way tables: try the following command to make a table that has the means by region and by marital status:

table region mard, c(mean rrsp)

8. Creating new variables.

a) The command to generate new variable is GEN. Now let’s try, for example, to create a binary variable for those who are equal to or older than 65. Note the double ‘=’ in the ‘if’ statement. STATA requires a double == in all ‘if’ statements.

gen over65 = 1 if age>=65

replace over65 = 0 if over65==.

If over 65==. in the above line means if the value in over65 is missing (==.).

summ over65

label variable over65 “person is age 65 or more”

b) Now create a binary variable for those who are exactly 30 years old. Notice the ‘~=’ command. This means ‘not equal to.’

gen age30 = 1 if age==30

replace age30 = 0 if age~=30

summ

c) Now create a binary variable for those who are exactly 30 years old AND married.

gen age30mar = 1 if age==30 & mard==1

replace age30mar = 0 if age30mar==.

summ

The ‘&’ sign means ‘and’. Both of the conditions have to be true for the if statement to be satisfied. If you need an ‘or’ statement, you can use |

d) Try finding the summary statistics of RRSP only for married families by using the IF command.

summ rrsp if mard==1

9. Dropping observations.

You can drop observations that you don’t want. Imagine you are only interested in those who are under age 65. We have a variable called over65, so let’s use that.

drop if over65==1

count

STATA drops all observations that have over65=1, since we don’t want them.

10. Keeping/ Dropping variables.

Keeping and dropping individual variables is different than dropping observations. Try dropping the age30 variable created in the previous step.

drop age30

Try the summ command to see what affect this had on your data set.

Codebook for econ390.dta:

The variables are derived from the 1984 version of the Family Expenditure Survey conducted by Statistics Canada.

Variable	Description
FAMINC	Family income in 1984 dollars.
MARD	Marital status. Takes the value 1 if married or in common-law relationship; 0 otherwise.
AGE	The age of the head of the household.
RRSP	The dollar value of Registered Retirement Savings Plan contributions in 1984, in 1984 dollars.
MTR	Marginal tax rate. The combined (federal/provincial) rate of income tax payable on the last dollar of the head of household’s income.
REGION	Region of residence. Takes the value 1 for Atlantic provinces; 2 for Quebec; 3 for Ontario; 4 for Prairie provinces; 5 for BC.

Some frequently used STATA commands:

Command	Use
cd c:\	Changes directory to c:\
clear	Clears the memory so that a new dataset can be loaded.
dir	Displays contents of current directory
count	Provides a count of the number of observations
desc	Shows variables currently in memory with description
summ	Shows means and standard deviations for all variables - can use ',detail' to get percentiles
reg y x	Runs OLS using y as dependent variable and x as independent variable
gen	Creates new variable
replace	Replaces old value with new value
drop x	Drops variable x
keep x	Drops all variables except for x
drop if x==0	Drops all observations with x=0
keep if x==0	Keeps only those observations with x=0
dprobit	Runs a probit regression, reports marginal probabilities
use data.dta	Brings dataset data.dta into memory
save data.dta	Saves dataset data.dta to disk (add ', replace') to overwrite existing data.dta
table x	Shows a table with the frequency distribution for variable x
table x, c(mean y)	Shows the mean of y for each value of x
compress	Compresses the dataset to take up the minimum amount of memory possible
insheet x y using data.dat, t	Brings variables x and y from ascii file data.dat (tab delimited) into memory
log using econ390.log, t	Creates a log file called econ390.log, text format
log close	Closes any open log file
capture log close	Closes any open log file, but doesn’t crash if there is no open log file
real (x)	Transfers variable x from text to numeric form. e.g. gen numbvar = real(textvar)