Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Bufn736

Updated:  March 1, 2022

Quantitative Backtest Group Project

Due Date:  Tuesday, March 15 at 4 pm EDT

Deliverable: Class Presentation (15 minutes maximum) + Powerpoint Slides

A. Objective:  We will backtest a portfolio strategy of the students’ choice over a 20-30 year period using the Python language and a variety of data sources.  Students should form groups of 5.  

B. Guidelines on Quantitative Strategy:

In-Sample “model fitting period” (I suggest using first 2/3 of data time period, e.g., 1985-2006 factors to predict 1986-2007 annual stock returns as the y-variable). Estimation Rules:

1. The universe=NYSE/AMEX/Nasdaq stocks, only common stocks (not preferred, etc.).

2. Each stock must have a price that exceeds $5 per share and a market capitalization of equity of at least $100 million at the beginning of a given forecast year, to avoid trading illiquid stocks

3. You should start with a factor model having 20 to 30 chosen factors, as some factors may not work out for you during the backtest step. It is normal to have to drop several factors because they don’t work.  The factors can be chosen from the textbook, research papers, or outside sources, but you should have some source that identifies each of these as “likely candidates” for picking stocks, and what to expect in terms of returns when you implement them as well as the proper way to construct the factors. One important consideration is to think about “categories” of factors when you decide on the set of factors.  For one review of “categories,” see the textbook, Chapter 4.

4. Keep in mind that you will need to collect historical data for each factor from WRDS or Morningstar Direct (or elsewhere) back to (ideally) year 1985, or even earlier, which will limit the types of factors you can use. Feel free to use Bloomberg, but the factors must be available for all stocks that existed at a particular date, regardless of whether they exist today. As a starting point, we have created a clean dataset of the 27 factors, and they are posted on ELMS (under the “Portfolio Project\Data” tab).  These factors are computed up to mid-2017, and different factors have different starting dates (due to the differing availability of the underlying data for the factors). I recommend that you start your backtest no earlier than 1974 (depending on the data availability for your factors)—due to the lack of data on Nasdaq stocks prior to 1974—and stop with data in mid-2017 (which predicts returns over the following 12 months).  You do not have to use any of these factors, but they are available for you. Also, you are expected to create some other factors, beyond these 27.  The more original factors you create (outside of the 27 provided) the better chance you have of a higher grade.  Of course, the quality and difficulty of forming each factor also weighs heavily into the grade for your project, so just a few very well-chosen and difficult to obtain factors can be a good approach.

5. Z-scores:  As you know, you should work with z-scores of each of your factors, relative to the industry average and standard deviation, and not the raw factor values themselves.  You should not z-score the y-variable, which is the following-year return. The following-year return should be obtained by compounding the monthly returns for each stock from the CRSP Stock database.

6. You should estimate your model with new fundamental exposures and returns (downloaded from Morningstar Direct, CRSP/Compustat, IBES, or other sources) at the end of each calendar year, starting with end of year t (for example, 1985) factor exposures to forecast calendar year t+1 (e.g., 1986) stock returns (cross-sectionally), and ending with end of year t+n (e.g., 2006) to forecast calendar year t+n+1 (e.g., year 2007) stock returns. In other words, this consists of many years of moving estimation windows. This will use the code from Python #5, modified to fit your data on factors. Note: in steps 9-13, you will need this fundamental exposure data for years t+n+1 to the end of the available data years (e.g., 2007 to 2017), and stock returns (e.g., 2008 to 2018), so you may as well download them at this time.

7. Like Python Lab Exercise #5:  Your objective is to estimate the factor premia on each factor during each estimation window, then compute the average factor premia across all windows and the t-statistic of this average (i.e., the Fama-MacBeth average and the Fama-MacBeth t-statistic). These represent the average profitability and the consistency of profitability of each factor, respectively. The t-statistic for a given factor premia estimate is computed as the sample average multiplied by the square root of the number of windows (for example, if you do a backtest starting with 1985 and ending with 2001, this would be the square root of 17) divided by the standard deviation of the premia estimates for that factor across the years (e.g., across the 17 values). Please see my “Fundamental Factor Model” note for further information and details.

8. You should then meet with your group to have a serious discussion on which factors to drop from your final model, which would be appropriate if they end up with a backtest t-statistic that is low (e.g., below 1.5) or the wrong sign (e.g., negative for the E/P ratio, which should positively predict returns). However, you may choose to keep a signal with a low t-statistic or even a wrong sign if there seems to be a strong economic story or a compelling research article that indicates it should work well when tried over many years. This is where your judgment comes in. (This is somewhat like using a Bayesian approach, where your strong priors outweigh the data result.) You should avoid dropping too many factors, or you will have a very weak model in step 9.  A rough “rule-of-thumb” is to keep at least 10 factors, and, hopefully, more for step 9 below.

Out-of-sample forecasting test of the model [last “n” years—roughly 1/3 of your time range, e.g., 2007-2017 factors, and 2008-2018 stock returns (monthly, so that performance analytics can be used that are based on monthly returns)]:

9. Now it’s time to use the model to score stocks during the out-of-sample period. First, the stock score at the end of year t+n+1 (e.g., 2007): you should weight each stock’s factor exposure z-score at the end of that year by the (properly signed) t-statistic derived for that factor in step 7 above. (The t-stat should already have the proper sign if the backtest worked in the right direction for that factor.) Add the t-stat weighted z-scores together for a stock to get the overall score for the stock.

Score = z-score(1) x t-stat(1) + z-score(2) x t-stat(2) + …

10. Portfolio weights going forward:  It is recommended that you simply go long an equal-weighted portfolio of the top 10% of ranked stocks, and equal-weight short the bottom 10%.  If you want to be fancier, you can try to optimize weights with a mean-variance optimization program, but this is not required. If you choose to do this, you can use any software or data source for choosing portfolios (positions in stocks each year). You should rebalance your portfolio at the end of each year (in the next step below).

11. Repeat 9 & 10 above to compute scores for stocks for every year, until you end with the end of the final data year (e.g., 2017) which is used to pick stocks for the following calendar year (e.g., 2018).

12. Next, compute the return difference between the top score equal-weighted 10% portfolio and the bottom score equal-weighted 10% portfolio for each month during years t+n+2 (e.g., 2008) to the final year (e.g., 2018). (Or, if you used another weighting scheme, such as portfolio optimized, compute the difference between the long and short portfolios for that weighting).

13. Finally, you will conduct performance analytics on your resulting out-of-sample returns over all months in the out-of-sample period. You should compute (1) Raw return, (2) CAPM alpha, (3) 4-Factor alpha, and (4) Information Ratio using 4-factor model.  Code for doing this is included in solutions to Python-lab #5. Guidance for what these mean can be obtained from Professor Wermers’ textbook, Chapter 3.

14. Your presentation should cover not only which factors you chose, but also mention the research articles that support them, and the predictions of the research (in terms of exact return expected and risk).  Next, you add slides that cover the results of your in-sample estimation (e.g., 1985/86 to 2006/07), your selection of your final model (after backtesting), and the results of your out-of-sample test (e.g., from 2007/08 to 2017/18 (and the performance analytics in #13 above). You might end with a conclusion slide about why your model did/did not work very well during the out-of-sample period. And, what you might try differently in the future.