Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Empirical Finance Spring II 2022

Final Group Project

In this project, you will be using the following files to do some analysis surrounding Elon Musk’s path to buying Twitter.  In the returns file, note that all returns are holding period returns and given in percent - these are not log returns:

❼ twtr russell3000 and 5factor daily .xlsx: This file contains daily excess holding-

period returns for Twitter and the Russell 3000 index as well as the corresponding Fama-French five-factor portfolio returns and the risk-free rate (RF)

❼ Tweets excel .xlsx: This file contains tweets and other news text from April 4th to

April 25th

❼ RunTweetTextAnalysis final project .m: This MATLAB file gives you a start and

some hints on Q1 related to tweet sentiment analysis

❼ RunTwtr5Factor final project .m:  This MATLAB file gives you a start and some

hints on Q2 related to estimating some daily factor models and calculating cumulative abnormal return (CAR)

Q1. Sentiment Analysis [18 points total]: Using RunTweetTextAnalysis final project .m as your starting point, read in both Excel files and perform the following steps:

1.  [1 point]Winsorize the Twitter returns at the 1st and 99th percentile. In other words, set all returns that are greater than the 99th percentile to the value of the 99th per- centile return, and set all returns less than the 1st percentile return to the value of the 1st percentile return. Hint: Use the MATLAB prctile function to help with this.

❼ Note that winsorization is a very common technique when dealing with datasets

that have outliers throwing off a model. A less-desirable alternative is removing the observations, but this eliminates data and reduces statistical power.

2.  [0 points]Move any weekend tweets/news to the next business day in a new column in your tweets table called bus date.  I have already done this in the MATLAB file for you, and I suggest you look up the busdate function in MATLAB to see how it works as well as what I’m doing in that code.

3.  [2 points]Calculate the VADER sentiment score for each Tweet and news item and add these scores as a new column to your tweets table.

4.  [2 points] Calculate the mean sentiment score for each bus date. We will call this the daily sentiment score, scoret . Note that you will not have a sentiment score for every day (only for a few days, in fact). The groupsummary function will be helpful here.

5.  [3 points]Using MATLAB code, find the Twitter returns corresponding to each day t from your list of sentiment scores, then run the following regression:

HPR rft  = α + βscoret

where HPR rft  is the Twitter excess holding period return for day t (this is what’s already in the Excel file).   Hint:  an innerjoin will be helpful to find the returns corresponding to the sentiment scores.

6.  [2 points]Display a table showing your estimated α and β as well as the standard error and t-statistic for each of these estimates (I’ve given you several examples of how to display a nice table with regression coefficients). Then display your R-squared estimate as well.

7.  [3  points]Based on your regression outputs, what is the effect on Twitter’s excess return from increasing a daily sentiment score from completely neutral (i.e., scoret  = 0) to completely positive (i.e., scoret  = 1)? Comment on your R-squared value as well.

8.  [5 points]I carefully chose the tweets and news snippets that got me a result that I liked, and they didn’t turn out as I had expected.  Keeping this in mind, do the following:

❼ Create an Excel file with a single column, called ”Words”, where you enter 10

words (one in each cell) by typing in a word you think is completely positive and then another that you think is completely negative (so you will be alternating good and bad words and you’ll end up with five of each). This way, you know to expect a negative followed by a positive.

❼ Read in the words, then calculate the VADER sentiment scores and display a

table showing both the words and their scores.

❼ Comment on the results. Are they what you expected, and what does this result

tell you about doing sentiment analysis? Can you think of any ways to improve this, whether in code, by hand or a combination of both? Note: You don’t need to implement any improvements - just describe at a high level (a couple sentences) what you might do.

Q2.Abnormal Returns (event-study stuff) [30 points total]: Using RunTwtr5Factor final project .m as your starting point, read in the

twtr russell3000 and 5factor daily .xlsx file and perform the following steps:

1.  [5 points]Using the daily holding period return data from Dec 1st, 2021 to March 31st (including those two dates), estimate the following factor models and, for each, display the estimated coefficients along with their standard errors and their t-statistics, and the R-squared for each regression:

HPR rft  = α + β ∗ Mkt RFt + s ∗ SMBt + h ∗ HMLt + r ∗ RMWt + c ∗ CMAt

HPR rft  = α + β ∗ ret rus3000 rft + s ∗ SMBt + h ∗ HMLt + r ∗ RMWt + c ∗ CMAt

HPR rft  = α + β ∗ ret rus3000 rft

where HPR rft  is the Twitter excess holding period return for day t (this is what’s already in the Excel file) and the other returns are named as listed in the Excel file.  Note that I have already set up the MATLAB script file with code that filters the data by the estimation time, and I have created a for loop that gives you a head start on writing the code to run and display the regressions just once and that will be executed three times. This is better than re-writing the same code three times.

2.  [3 points]Using the coefficients from the last regression (the one with only the Russell

3000 factor) as your benchmark data (i.e., a control group), predict the Twitter returns, A(H) kicking off the events that eventually led to the announced purchase.

3.  [3 points]For each day of your predicted returns, calculate the cumulative abnormal return (CARt ), i.e.:

t                                              t

CARt  =     (1 + HPR rft ) −   (1 + HPR rft )

where j = 1 is the first day of your predicted returns (i.e., April 4th).  Note:  Re- member that your holding period returns from the Excel file are in percent, so you will need to divide them by 100 in the above equation).  Also, the cumprod MATLAB function is helpful here.

4.  [2 points]Similarly, for each day of your predicted returns, calculate the cumulative abnormal log return (lCARt ), i.e.:

t

 

j=1

log(1 + HPR rft ) log(1 + HPrft )i

The cumsum function is helpful for the lCARt calculation like the cumprod function was in the above CAR calculation.

5.  [2 points]Plot your cumulative abnormal returns and your cumulative abnormal log returns (all returns in percent, i.e., multiplied by 100) on the same plot, with the date of those abnormal returns on the X-axis.

6.  [3 points]Create another plot with dates on the X-axis and the following four daily series plotted on the Y-axis in percent (also, include a legend):

❼ The Twitter return during the measurement period (note that this series will not

overlap dates with the other three series)

❼ The predicted Twitter return during the abnormal return period ❼ The actual Twitter return during the prediction period

❼ The CAR during the prediction period

7.  [12 points]Answer the following questions, in just a couple sentences each:

❼ Note that I used the Russell 3000 data because Kenneth French doesn’t post

his Fama-French (F-F) updated data until relatively late in the month after the data are generated, so I couldn’t give you April F-F data yet.  Based on your regressions above, does the Russell 3000 seem like a pretty good proxy for the Mkt RF from the Fama-French data?  Why or why not?  (by the way, as I mentioned in class, the S&P is not a good proxy for the Mkt RF).

❼ Which of the factor portfolios had coefficients that were statistically

significant in the regressions?

Do the R-squared values of the five-factor portfolio regressions give you

confidence that they would make for a good control group if you were to do a full-scale event study with many firms and not just Twitter?

How much of a drop-off in R-squared was there from the five-factor

models to the one-factor model we used for the predictions? Does that invalidate the predictions (and ultimately the CARs) for you, and why or why not?

❼ The CARs during the prediction period are pretty large, and we know

that log returns are approximately equal to holding period returns at least when they are small.  Given the large magnitudes of the CARs, does the difference between the  CAR and lCAR bother you enough that you think log returns are deceptive in this case?

If we were to  make this a  full event  study,  how would you  describe

the event that we are studying and what other data would we need to gather?