QBUS6840

Predictive Analytics

Semester 1, 2021

Homework (20%)


1 Rationale

This assignment is designed to help students to develop basic predictive analytics skills on real applied problems. The skills include model building, analysis and data visualization in terms of understanding the theory, practising with raw data and programming in Python.

    If you spot any typos or mistakes in this assignment, please report immediately to the coordinator [email protected]


2 Questions

The dataset Price_History_ASX200, available on Canvas, contains the daily close price of the S&P/ASX 200 (XJO) index from Feb 2007 to Mar 2021. The S&P/ASX 200 index is Australia’s leading share market index and contains the top 200 ASX listed companies by float-adjusted market capitalisation. It accounts for 88% (December 2020) of Australia’s equity market.

    The historical data are contained in the Close Price column of the Price_History_ASX200 dataset. Extract only data from 20/03/2012 to 19/03/2021 and denote this time series as {yt , t = 1, ..., T}.

(a) Write a Python script to load the data and produce their time series plot. Include the plot and your comments (if any) together with the Python code in your submission.

(b) Smooth the time series using symmetrically centered CMA-5, CMA-20, CMA-50, CMA-100. Plot the smoothed time series together with the original time series. Include the plot and your comments (if any) together with the Python code in your submission.

(c) Write a Python script to produce one-step-ahead forecasts for the last 100 observations using the naive forecasting method with drift.

• Plot these forecasts together with the actual values. Include the plot and your comments (if any) together with the Python code in your submission.

• Report the scale-dependent measure Root Mean Squared Error (RMSE) and scaleindependent measure Mean Absolute Percentage Error (MAPE) (the errors between forecasts and the ground truth prices).

(d) Given the stock prices {yt , t = 1, ...T}, the stock returns are defined as

Write a Python script to compute the stock returns and produce their time series plot. Comment on this plot in conjunction with the plot of the prices {yt , t = 1, ..., T}. Include the plot and your comments (if any) together with the Python code in your submission. Calculate the descriptive statistics for the time series rt : max, min, sample mean, sample variance, kurtosis and skewness. Comment on these values.

(e) For the return dataset {rt}

• Use the last 100 observations as testing data, and the previous observations for the training data. Use the training dataset to estimate the parameters (weight α and initial level . You may set to be the first observation or the average of a few first observations) of the SES method.

• Based on these estimates of α and , compute one-step-ahead forecasts on the test data . Compute the Mean Absolute Percentage Error (MAPE) and plot the forecasts. Please also include your Python code in submission.

(f) For the squared return dataset

• Use the last 100 observations as testing data, and the previous observations for the training data. Use the training dataset to estimate the parameters (weight α and initial level . You may set to be the first observation or the average of a few first observations) of the SES method.

• Based on these estimates of α and , compute one-step-ahead forecasts on the test data . Compute the Mean Absolute Percentage Error (MAPE) and plot the forecasts. Please also include your Python code in submission.

(g) Comment on the predictability of {rt} and {xt}. You’re encouraged to use domain knowledge here in addition to the data analysis findings.


3 Instructions and Marking criteria

For parts (e) and (f), the more accurate forecasts (i.e., smaller MAPE) you have, the more marks you’re given.

Assignment Report: The assignment report should be presented as a technical report that:

• details ALL required steps,

• provides sufficient explanation and interpretation of any results you obtain. Output without reasonable justifications will not receive full marks,

• clearly and appropriately presents any relevant tables, graphs and screen dumps from programs if any. You may insert small sections of your code into the report for better interpretation when necessary. Find the most structured and creative way to present your work, summarise the implementation procedures, support your results/findings and prove the originality of your work,

• reports numbers with decimals to the three-decimal point,

• properly cites all the references if any.

    You’re encouraged to incorporate domain-knowledge (here, knowledge from finance, stock markets) into your report. This will strengthen your report and make it interesting. Assessment of your written presentation skill is part of this assignment. Markers will allocate up to 10% of the mark for presentation.

    Important notes:

• Required submissions: ONE written report, in PDF (preferable) or MS word format, and ONE Python source code file (Jupyter Notebook .ipynb or .py). Please follow instructions for submissions announced on Canvas.

• The late penalty for the assignment is 5% of the assigned mark per day, starting after 4:00pm on the due date. The closing date is the last date on which an assessment will be accepted for marking. See Canvas for the due date and closing date of this assignment.

• As per anonymous marking policy, only include your Students ID in the report and DO NOT include your name.

• The name of the report and code file must follow the format

    <your SID>_QBUS6840_Assignment1_2021S1.

• The report should be NOT more than 15 pages including everything like text, figures, tables and small sections of inserted codes, etc, but excluding the appendix.

• The University of Sydney takes plagiarism very seriously. Please be warned that plagiarism between individuals is always obvious to the markers and can be easily detected by Turnitin.

    Key rules:

• Carefully read the requirements for each part of the assignment.

• Please follow any further instructions announced on Canvas, particularly for submissions.

• You must use Python for the assignment. To avoid any potential issues with your codes, please use the latest Anaconda version for your Python programs.

• Reproducibility is fundamental in data analysis, so that you are required to submit a code file that generates your results. Not submitting your code or submitting code that are not runnable will lead to a loss of 50% of the assignment marks.

• Referencing: Harvard Referencing System. Please find the details at: http://libguides.library.usyd.edu.au/c.php?g=508212&p=3476130