Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

QBUS2820 Assignment 2: Cryptocurrency Day Trading Overview

The assignment consists of two problems: One is forecasting and the other can be seen as classification . This is a real life” exercise, you will forecast actual future values, and we will measure the performance in a test set that is not available yet (As opposed to the typical academic exercise of splitting a dataset     into train/test). Not even the teaching team has the test set” .                                                                            You will provide forecasts for the daily returns for several cryptocurrency/USDT pairs. The forecasts are  used for day trading, that is, buying/selling tokens for a given crypto in the morning when the markets    open (at the opening price) and then selling/buying when the market close that same day.                          A notebook will be uploaded to canvas in a later date to help with the programming and provide an         example of valid submission files/format.                                                                                                                  The late submission penalty for the assignment is 5% of the assigned mark per day, and will prevent         winning any ‘extra points’ (see the Evaluation section)

Context (for flavor)

Imagine that you are a student at an (allegedly) prestigious university and are just learning about              forecasting. You have a friend/relative that is keen on cryptocurrencies and investing” . In a casual          conversation with them, you mention that you are studying forecasting. This person is very proud of you (really believes in you) and asks you to send them your predictions for the market, and they will invest    based on that. You mention that this is not a good idea, that you are just learning about forecasting,        furthermore market prediction is very hard (some people even say it is random). However, there is no    convincing them, they are adamant about using your forecasts.

To prevent catastrophic results, you devise a plan: on top of the forecasts, you will also provide an         “Overseer Artificial Intelligence”(AI) that will tell them whether to invest or not in an asset for any given day, kind of a danger detection AI. The idea for this AI is to prevent an over-optimistic use of your           forecasts, because there is an uncertainty/risk around them. So the AI provides a clear-cut decision        about this uncertainty. This AI is of course a classifier.

The forecasts

You forecast the opening price and closing price for each cryptocurrency. Specifically, you forecast the   daily returns, as log[ (closing_price) / (opening_price) ] for each day. It is the logarithm of the ratio, we use the logarithm of make the returns symmetric around 0 (e.g. a ratio of 2 to 1 and a ratio of 2 to 1       have the same distance to zero when the take the logarithm) its is very common to measure log returns and in general apply the logarithm transformation when the variable measured is a ratio.

The forecasts are used for day trading” but the trading strategy is out of your control, it is automatic, depending on the value of the forecasts. The trading strategy is as follows:

•    For each cryptocurrency and each day, if the forecast of the return is positive, the investor buys  a number of tokens (t) at the opening price. The amount (t) is determined by the price in dollars .

The investor always buys a fixed amount of US dollars each day for each currency. For example,  always buy s dollars' worth of tokens, getting t = (s/open_price) tokens . Assume that dollars and tokens are infinitely divisible.

•    At the end of the day, the investor sells the t tokens bought in the morning and the closing price, getting (t x close_price) dollars back. So they end the day with  more dollars that they started      with if the actual returns (not the forecasts!) are positive .

•    If the forecast of the returns are negative, the investor always sells a fixed amount s dollars'      worth of tokens at the opening price, they “sell”  t tokens. (t = s/open_price). At the end of the day, they buy the t tokens back at the closing price.

•    We assume that the investor already has enough dollars and tokens of each crypto to do the trading at any given point.

•    A simple interpretation: assuming that the forecasts are ‘accurate’ (they predict the sign of the returns well), they make s x absolute_value( exp( abs(actual_returns))) each day for each asset (we undo the logarithm transform). The precise definition can be found in the reference              notebook `evaluate_submission`.

You will provide forecasts for the returns of each asset for the next 5 days.

You can use any of the forecast models seen in the unit (from the naïve to the ARIMA).

The classifier

The classifier makes binary predictions that control the investor, they can activate or deactivate the         investor. For each asset and each day, the classifier outputs one of the two classes ‘TRADE’ or ‘NOT         TRADE’ . If the class predicted is TRADE’ the investor goes on with their trading strategy that day for that asset: they look at the forecast and they trade based on the sign of the returns. If the predicted class is   ‘NOT TRADE’ the investor does nothing.

The classifier can use values of past returns and other variables included in the data (the past open and   close, high and low, volumes ...) even from other assets. It can also include the forecasts for the given      day (for example, to detect if they go crazy). You can think of the classifier acting on top of the forecasts, considering more variables and even more complex interaction.

You can use any classifier model that you like, you are not limited to logistic and Nearest Neighbours. A key part is how you define the target variable.

The data

The information comes from the Binance Cryptocurrency Exchange, as reported by

https://www.CryptoDataDownload.com

Prices come from the crypto/USDT pairs, the daily datasets. For example BTC/USDT, for the daily Bitcoin prices in US dollars.

You can calculate the daily returns as the variables by log(close/open).

We will use the values from that webpage to measure the performance, to obtain the test set, for 5 days after

The files for the historical prices can be found in the following urls (already in python). Be mindful and do not download them more than necessary (once per day). Also be mindful of timezone changes.

crypto_urls = ["https://www.cryptodatadownload.com/cdd/Binance_BTCUSDT_d.csv",

"https://www.cryptodatadownload.com/cdd/Binance_ETHUSDT_d.csv",   "https://www.cryptodatadownload.com/cdd/Binance_LTCUSDT_d.csv",    "https://www.cryptodatadownload.com/cdd/Binance_BNBUSDT_d.csv",  "https://www.cryptodatadownload.com/cdd/Binance_XRPUSDT_d.csv",   "https://www.cryptodatadownload.com/cdd/Binance_EOSUSDT_d.csv",   "https://www.cryptodatadownload.com/cdd/Binance_TRXUSDT_d.csv",   "https://www.cryptodatadownload.com/cdd/Binance_NEOUSDT_d.csv",  "https://www.cryptodatadownload.com/cdd/Binance_ETCUSDT_d.csv",   "https://www.cryptodatadownload.com/cdd/Binance_XLMUSDT_d.csv",  "https://www.cryptodatadownload.com/cdd/Binance_DASHUSDT_d.csv", "https://www.cryptodatadownload.com/cdd/Binance_XMRUSDT_d.csv"]

What you need to submit

•    A csv file containing the forecasts for the returns for the next 5 days.

o The file has the columns: date, symbol, returns, action.

o ‘action’ means the prediction of the classifier, in TRADE’ or NOT TRADE’

o Symbol is the id of the pair, in the same format as it appears in the data from cryptodatadownload.com (the symbol column in the crypto pairs files)

o The filename must be STUDENTID_ASG2_preds_QBUS2820.csv’ . This is, your student id underscore ASG2_preds_QBUS2820.csv.

•    A notebook that creates the forecasts and documents the decisions.

o Divide it into sections and document clearly what you are doing using markdown cells    before the code of each section. You can use one (or more) cells for the methodological discussion, this should be separated from the cells that clarify technical (programming   parts).

o The notebook has to have an Analytics’ section at the end where you will report              expected values for the average absolute error of the returns. You will also report            expected value of the amount of dollars that you end up with, assuming a value of           ‘s=100’ . Together with the 95% prediction intervals for both quantities (returns and final dollars).

o The filename of the notebooks should be STUDENTID_ ASG2_nbook_QBUS2820.ipynb’


o No pdf document is needed, make sure that the notebook is as clear as possible, the  Jupyter notebooks were created specifically for these kinds of data analysis tasks, you can find many examples that interweave code and analysis.

Evaluation

Correctness of the application of the methodology:

1.      Forecast model selection: Comparing models and potentially: variable transformations, data cleaning, model validation.

2.      Classification model selection: Comparing models, discussion of a definition of a target variable for the classifier, model validation.

3.     Estimation of the performance: The procedure that is used for estimating the actual performance. This is also related to model validation for the forecasts and classifier.

Actual performance: 15% of the mark depends on the performance beating a ‘naive’ model. This is a      model that predicts the returns for the next 5 days are the same as the last observed return, and the      classifier always outputs ‘TRADE’ . You can beat the naive in either mean absolute error for the forecasts of the returns OR in the final amount of dollars. Any of those measures will grant you 15% of the marks.

EXTRA POINTS:  The top 10% of the submission will receive extra points that will be added to the total   marks for the unit. 10 points out of a total of 100 will be granted. A student cannot score more than 100 points the unit, but getting extra points might compensate for the errors in the assignments or final        exam.

CLARIFICATION ON FORECASTS DATES: The forecasts in the submitted csv file should start the day after the submission. If you submit 13-July-2022, the first date in the submitted file should be 14-July-2022.    Failure to do so invalidates the 15% from the performance vs. the naive and the possibility of getting      extra points. If you get an extension for the assignment use the date that you submit, not the original     deadline.

DISCLAIMER

This assignment should not be taken as a real financial exercise. Actual ‘day trading’ is not usually done like that, we assume no transaction costs, we are assuming infinite liquidity, cryptos are traded 24h,      etc...

The purpose of choosing a topic around crypto and investing is to:

1.    Showcase a real time exercise, instead of a canned’ dataset to analyze. No peeking at the test set!

2.    Understand the uncertainty around forecasts AND the importance of a good estimation of the future performance of a methodology. This is a highly random’ situation and being                    overconfident can lead to catastrophic results.

3.    Understand that the forecasts are rarely the end product’, they are used as part of a decision- making process. You can use information about what are the forecasts needed for to adjust

your methodology (in this case, the classifier and how we consider the classes, maybe the use of tailored error functions for model selection...)

4.    Hopefully is a bit fun!

If you have problems around the validity of the assignment, have comments or suggestions a please send an email to the coordinator or post it in the forums.