Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

FNCE 435

Module 3: Intro to Data Access and Management

3. Data Access and Management: An Introduction

Objectives

Discuss the basics about WRDS databases

Discuss how to access WRDS datasets using the SAS environment

Discuss the use of SQL commands in SAS

Show the basic on SAS data types and on entering data

3.1. WRDS Database

So far, we just used returns on IBM stock. We will also learn how to get this and many other examples of financial and economic data. In the case of returns, the data comes from WRDS database.

In its own description, WRDS provides instant access to important databases in the fields of finance, accounting, banking, economics, management, marketing and public policy.”

WRDS is composed of different databases, such as:

CRSP (stock price information)

COMPUSTAT (accounting data and much more)

I/B/E/S (sell-side analystsoutput)

Thomson Financial (institutional ownership)

and many others.

You can find the web interface for WRDS at:

https://wrds.wharton.upenn.edu

Check it out for the complete set of databases it holds. It is also interesting to use the web interface sometimes, for example for quick checks” on a specific dataset.

3.2. WRDS Database: Accessing From SAS in the Unix Environment

One can always use the web interface to access and download data from WRDS (believe me, many in academia and in the industry still rely on the web interface), but I hope I will convince you that accessing the WRDS through SAS is much more efficient.

The simplicity comes from the fact that the databases from WRDS are stored as SAS datasets. They are located in the WRDS Unix server ‘wrds-cloud.wharton.upenn.edu’, which is the one you also have access to run your SAS codes! So, our discussion in this section assumes that you are running SAS specifically on that Unix server.

For example, the CRSP database has a dataset, called MSF (for monthly stock file), that contains monthly stock prices and returns for stocks traded in US.

We can access that dataset like we did for the MONTHLY_IBM_RETURNS, i.e.:

data d;

set crspsd.msf;

Two things to notice. First, the MSF is simply one permanent SAS dataset. Second, we included a reference to a library CRSPSD. The library has to point to the physical location where the MSF dataset is located.

So this program has to be expanded to contain the definition of that library.

libname crspsd ‘/wrds/crsp/sasdata/a_stock’;

data d;

set crspsd.msf;

No need to worry about the physical location of WRDS datasets for now; they will be supplied to you as the course progresses. Notice also that the code above is written for the Unix environment (we are using ‘/’, not ‘\’, to separate directories). Soon we will learn what to do when running the SAS code from the Windows environment.

Step back and see what the program above is doing. The temporary dataset D is receiving all the contents of the MSF file—this mean monthly returns from all stocks and all dates from the database. That would be a huge file! Unnecessarily huge, since we are interested in returns from IBM between 1980 and 2009.

We thus add a WHERE clause to limit the search. (Such WHERE clause can be used for any DATA or PROC step.)

libname crspsd ‘/wrds/crsp/sasdata/a_stock’;

data d;

set crspsd.msf;

where permno=12490 and date>=”01JAN1980”d and

date<=”31DEC2009”d;

Actually, I wrote that clause because I knew already the contents of the dataset MSF. Otherwise, good programming practice would tell us to run a PROC CONTENTS first on this new dataset.

proc contents data=crspsd.msf;

run;

The output would show the following variables:

# Variable Type

Len Format Informat Label

8 12.5      12.5     Price Alternate

8 YYMMDDN8. YYMMDD6. Alternate Price Date

18 ALTPRC Num

20 ALTPRCDT Num

14 ASK  Num

9 ASKHI    Num

13 BID  Num

8 BIDLO Num

16 CFACPR Num

17 CFACSHR Num

Bid or Low Price

Cumulative Factor

to Adjust Prices

Cumulative Factor to

Adjust Shares/Vol

1

CUSIP

Char

8 8.

8.

CUSIP Header

7

DATE

Num

8 YYMMDDN8.

Date of Observation

5

HEXCD

Num

8 2.

2.

Exchange Code Header

6

HSICCD

Num

8 8.

8.

Standard Industrial Classification Code

4

ISSUNO

Num

8 8.

8.

Nasdaq Issue Number

3

PERMCO

Num

8 8.

8.

PERMCO

2

PERMNO

Num

8 8.

8.

PERMNO

10

PRC

Num

8 12.5

12.5

Price or Bid/Ask Average

12

RET

Num

8 11.6

11.6

Returns

21

RETX

Num

8 11.6

11.6

Returns without

Dividends

15

SHROUT

Num

8

Shares Outstanding

19

SPREAD

Num

8 11.5

11.5

Spread Between

Bid and Ask

11

VOL

Num

8 10.

10.

Volume

Still puzzled by the contents? Another good practice is to take a look at some data, for example by printing the first few observations on the dataset. The following code does just that.

data d_output;

set d;

if _n_<=5;

proc print data=d_output;

title MSF from CRSP: First 5 observations’;

The _N_ is a SAS automatic variable that records the number of times SAS has looped through the DATA step. Here, it equals the observation number. The IF condition restricts the data to be included  in  the  D_OUTPUT.  Thus,  only  the  first  5  observations  of D  are  included  in  the D_OUTPUT. The PROC PRINT generates an output with the data from these 5 observations:

MSF from CRSP: First 5 Observations     3

12:31 Saturday, August 16, 2014

Obs CUSIP PERMNO PERMCO   ISSUNO HEXCD   HSICCD     DATE

1

45920010

12490

20990

0

1

3571

19800131

2

45920010

12490

20990

0

1

3571

19800229