FNCE 435 Module 3: Intro to Data Access and Management
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
FNCE 435
Module 3: Intro to Data Access and Management
3. Data Access and Management: An Introduction
Objectives
Discuss the basics about WRDS databases
Discuss how to access WRDS datasets using the SAS environment
Discuss the use of SQL commands in SAS
Show the basic on SAS data types and on entering data
3.1. WRDS Database
So far, we just used returns on IBM stock. We will also learn how to get this and many other examples of financial and economic data. In the case of returns, the data comes from WRDS database.
In its own description, WRDS “provides instant access to important databases in the fields of finance, accounting, banking, economics, management, marketing and public policy.”
WRDS is composed of different databases, such as:
CRSP (stock price information)
COMPUSTAT (accounting data and much more)
I/B/E/S (sell-side analysts’ output)
Thomson Financial (institutional ownership)
and many others.
You can find the web interface for WRDS at:
https://wrds.wharton.upenn.edu
Check it out for the complete set of databases it holds. It is also interesting to use the web interface sometimes, for example for quick “checks” on a specific dataset.
3.2. WRDS Database: Accessing From SAS in the Unix Environment
One can always use the web interface to access and download data from WRDS (believe me, many in academia and in the industry still rely on the web interface), but I hope I will convince you that accessing the WRDS through SAS is much more efficient.
The simplicity comes from the fact that the databases from WRDS are stored as SAS datasets. They are located in the WRDS Unix server ‘wrds-cloud.wharton.upenn.edu’, which is the one you also have access to run your SAS codes! So, our discussion in this section assumes that you are running SAS specifically on that Unix server.
For example, the CRSP database has a dataset, called MSF (for monthly stock file), that contains monthly stock prices and returns for stocks traded in US.
We can access that dataset like we did for the MONTHLY_IBM_RETURNS, i.e.:
data d; set crspsd.msf; |
Two things to notice. First, the MSF is simply one permanent SAS dataset. Second, we included a reference to a library CRSPSD. The library has to point to the physical location where the MSF dataset is located.
So this program has to be expanded to contain the definition of that library.
libname crspsd ‘/wrds/crsp/sasdata/a_stock’; data d; set crspsd.msf; |
No need to worry about the physical location of WRDS datasets for now; they will be supplied to you as the course progresses. Notice also that the code above is written for the Unix environment (we are using ‘/’, not ‘\’, to separate directories). Soon we will learn what to do when running the SAS code from the Windows environment.
Step back and see what the program above is doing. The temporary dataset D is receiving all the contents of the MSF file—this mean monthly returns from all stocks and all dates from the database. That would be a huge file! Unnecessarily huge, since we are interested in returns from IBM between 1980 and 2009.
We thus add a WHERE clause to limit the search. (Such WHERE clause can be used for any DATA or PROC step.)
libname crspsd ‘/wrds/crsp/sasdata/a_stock’; data d; set crspsd.msf; where permno=12490 and date>=”01JAN1980”d and date<=”31DEC2009”d; |
Actually, I wrote that clause because I knew already the contents of the dataset MSF. Otherwise, good programming practice would tell us to run a PROC CONTENTS first on this new dataset.
proc contents data=crspsd.msf; run; |
The output would show the following variables:
# Variable Type |
Len Format Informat Label 8 12.5 12.5 Price Alternate 8 YYMMDDN8. YYMMDD6. Alternate Price Date |
18 ALTPRC Num 20 ALTPRCDT Num 14 ASK Num 9 ASKHI Num 13 BID Num 8 BIDLO Num 16 CFACPR Num 17 CFACSHR Num |
|
Bid or Low Price Cumulative Factor to Adjust Prices Cumulative Factor to |
|
Still puzzled by the contents? Another good practice is to take a look at some data, for example by printing the first few observations on the dataset. The following code does just that.
data d_output; set d; if _n_<=5; proc print data=d_output; title ‘MSF from CRSP: First 5 observations’; |
The _N_ is a SAS automatic variable that records the number of times SAS has looped through the DATA step. Here, it equals the observation number. The IF condition restricts the data to be included in the D_OUTPUT. Thus, only the first 5 observations of D are included in the D_OUTPUT. The PROC PRINT generates an output with the data from these 5 observations:
MSF from CRSP: First 5 Observations 3 12:31 Saturday, August 16, 2014 Obs CUSIP PERMNO PERMCO ISSUNO HEXCD HSICCD DATE
|
2022-09-19