Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Homework 1: MFIN 7035

In this assignment, we’ll learn the basics of database management.

All the data you need is on the web. You can download Parquet files and operate DuckDB. Or you may use MySQL, Clickhouse or Starrocks.

What to submit

1. You make work in groups of 3 (no working in groups of 4, sorry.)

2. Due date:  April 11 10am

3. What to submit: A writeup and code. The writeup may be an .ipynb file with embedded output + writeup.

4. You must use code. Python, R, Julia are acceptable. NO VBA, Excel etc.

5. How to submit: Moodle

6. At the top, please write

· Each group member’s name

· Each group member’s HKU email

· Each group member’s HKU Student ID

· What percentage each group member did

Questions

1. Re-do the HW1 question about Damadoran + stock returns. Hint. You can use methods outside of Python/R/etc. lesson of this question: getting a dataset, creating a SQL table. Learning how to do basic operations.

1) Take one of the databases. Read the excel file in Python and upload it to DuckDB/MySQL/Clickhouse/Starrocks. Document how you did so – preferably you’d do it via code.

2) Re-do the question about Sharpe ratio, average return, etc.

3) Make the final table in Python.

2. In one query using crsp.msf and crsp.dsenames, do the following: 1. 11 month momentum, skipping 1 month this is the same as in class, 2. Same-month return over the last ten years (e.g. March-Microsoft, Feb Microsoft) this is the same as in class, using the calendar anomalies lecture with slight modification , 3. lag 1 month return, 4. rolling 24 month volatility. Do this only for stocks with shrcd 1,2,3 and hexcd 10,11. lesson of this question: Learn how to compute factors, window functions, and think about performance. Use window functions.

You don’t need to have the exact same results as the answer key, you just need to have the same results btw pandas and SQL and they need to be ballpark reasonable.

1) Report the query you use

2) Report the summary statistics of your variable to make it easy for the TA to grade. So, that means report the summary statistics: q0,q1,q5,q10,q25,q50,q75,q90,q95,q99,q100.

Hint. You may wrap the prior query in a subquery, or make a temp table, as otherwise you will have to pull the entire dataset into memory across the web in order to do the calculation.

3) Do the same thing in Pandas. Using the parquet files, read them in. If you are stuck, here’s a GPT4 turbo version prompt https://www.perplexity.ai/search/2-In-one-hU21KIGvQDqzQok9mM4o0Q

4) Explain the differences between SQL and Python in terms of speed. Which one was more efficient and why? How does your query differ from using Pandas on your laptop in terms of things like RAM usage, hardware, network transfer of data, or just overall efficiency?

5) Extra credit (2 points): do the entire calendar anomalies portfolio sort (same month return) in SQL up until the point of making graphs, using a $5 price filter at t-1. All steps must be done in SQL. Report the queries returns in a table of the portfolio 1 through 10 average return across time.

3. 13-F is disseminated 45 days after the quarter end. Suppose then the quarter end is December 31, 2023, then next 13-F is available Feb 15. Lesson of this question: mapping data together, and doing the “joining” you are conceptually familiar with  to SQL. Please use file1, file2, file3, ignore s34type3 since they have some errors.

1) Report me Pershing Square’s manager number and what is the query you used to find it.

2) In terms of dollar value, what were Bill Ackman’s top 2 holdings every quarter? Report the COMNAM, PERMNO, NAICS from CRSP.  Use a rank function (Clickhouse, DuckDB, MyQL, StarRocks) or limit by function ( in Clickhouse) to obtain the largest holding.

3) Based on his holdings, does Bill Ackman have good taste in Mexican food? (hint: knowing Alan is from LA, there is only one answer, and if you get it wrong you get zero on this assignment.. just kidding)

4) What are the returns of Bill Ackman’s portfolio, if you invested at end of the prior quarter and held until the next quarter and rebalanced?

5) Why are Bill Ackman’s reported hedge fund returns different from his 13-F-implied returns? State at least two reasons, I can think of four off the top of my head.

13-F is the top source of information for institutional investors in the United States, and many countries / markets have similar disclosure rules as in the US. However, Two Sigma once asked the SEC for confidential treatment claiming that she “engaged in a program of acquisition and disposition that employs a statistical arbitrage investment strategy,” and that disclosure of its securities position would expose its investment strategy to “reverse engineering.” 

Extra credit (4 points) Plot three lines:

1) the monthly return of the market (mkt_rf+rf),

2) the strategy of copying Bill’s trades assuming you knew his trades as of the filing date,

3) and assuming you know his trades as of when the public finds out (two months late).

So in the example above, if the quarter end is December 31, 2023, you can earn the Jan 2024 – March 2024 returns. Else, you find out 45 days later and earn the March 2024- May 2024 returns. You need instown.file1 to find Bill Ackman’s fund number, file3 to find his holdings, crsp_202401.msf for returns (and lagged prc). Answer question: does 13-F protect his returns?

Here are some steps to help you:

1. Collapse crsp.msf to quarter_id via year(date)*4+quarter(date)

2. Merge instown.file3 to crsp.dsenames via cusip in file3 / ncusip, note the 8 vs 9 digit cusip

3. See the hint dataset below. The first one is assuming you trade the time of filing, the second one assumes you trade 2 months after filing.