ACCFIN5246 Data Science & Machine Learning in Finance
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Data Science & Machine Learning in Finance (ACCFIN5246)
Individual Assignment – Spring 2023
Instruction
— This is an individual assessment.
— Answer all questions listed in Section 4: Q.1, Q.2, Q.3, Q.4 and Q.5.
— Submission to be made electronically via the course Moodle page.
— This submission includes a written report including brief accounts on the logical steps taken,
numerical answers, diagrams, and comments (each part in Section 4 specifies how answers are required to be presented).
— Submit only the main report (no additional spreadsheet, nor software routine, etc.).
— Clearly number each part in your report and structure the answers to follow the same order
in Section 4: Q.1, Q.2, Q.3, Q.4 and Q.5.
— The project outline presented acorss four sections to develop the implementations, estimation
and arrangement of the results.
— Each part, within the assignment’s overall grade, carries a weight described below:
Weight
10%
15%
20%
40%
15%
— Results should be reported in a clear format. Avoid reporting numbers in the ‘scientific
format’ e.g. 7 .2031e-06. All reported numbers should be rounded to two decimal points. For example, report 0.00 in place of 7 .2031e-06.
— The project includes an Appendix (X1) which provides additional data and their descriptions. — All data, required for the project, is accessible via the course Moodle page.
Section 1: Finance Models
Consider the standard and two extended capital asset pricing models characterised below in equa- tions (1)-(3), denoted as MoDEL(A), MoDEL(B) and MoDEL(C), rspectively:
MoDEL A: rt _ rf,t = αw(A) + βw(A) .(rM,t _ rf,t ) + et(A) (1)
MoDEL B: rt _ rf,t = αw(B) + βw(B) .(rM,t _ rf,t ) + ψi,w .北i + et(B) (2)
MoDEL C: rt _ rf,t = αw(C) + βw(C) .(rM,t _ rf,t ) + θj,w .北j + et(C) (3)
Each model aims at interrelating the real excess equity log-return as the outcome variable, on a given asset rt _ rf,t where rf,t is the risk-free log-return, to the right-hand-side explanatory vari- ables, where:
[1.1] Time (daily frequency) is denoted by t = 1, . . . , T and the window index (consecutive days)
is denoted by w = 1, . . . , W .
[1.2] Each x-covariate denotes an additional financial, market-level or macroeconomic variable,
described in Appendix (X1).
[1.3] x-covariates are indexed by i and j in Model B and Model C, respectively, where i = 1, . . . , K
and K describes the number of available covariates in Appendix (X1), whereas j e 室 (w) refers to a sub-selection of all available covariates. The selected covariates included in set 室 (w) is carried out based on the learning method described in Section (3). Note that the selection outcome in 室 (w) is permitted to vary based on the rolling window.
[1.4] Specification error terms are denoted by {∈t(A), ∈t(B), ∈t(C)} across Models A-C, respectively. [1.5] Model constants and CAPM parameters are described by {αw(A), αw(B), αw(C)} and {βw(A), βw(B), βw(C)},
respectively, in Model A, Model B and Model C. The additional x-covariates are parame- terised according to ψi,w and θj,w , in Model B and Model C, respectively.
The object of interest in each model is the time-varying coefficients described in [1.5]. Each model provides a suggested specification, ultimately aiming at uncovering the true but unobserved α and β .
Section 2: Data
The focus range covers 2000/01/03-2022/06/03, on a daily basis in Sheet (A) and separately on a monthly basis in Sheet (B). The dataset including all variables are provided on the course Moo- dle page, however you are required to carry out the relevant transformations to convert prices to returns:
[2.1] Observed Amazon Inc. daily stock price (variable AMZN) acquired from the Wharton Research
platform (WRDS) is provided towards the construction of (rt ) (nominal) log-returns.
[2.2] Observed US daily equity market return (variable SP500) described by the S&P500 composite
market index return is provided towards the construction of (rM,t ).
[2.3] The nominal daily risk-free log rate (variable rf), associated with the US treasuries is pro- vided such that excess log-returns rt _ rf,t and rM,t _ rf,t are constructed in real terms.
[2.4] Appendix (X1) provides the description of 14 additional x-covariates (the total variable count
is 17 including [2.1], [2.2], [2.3]).
[2.5] All prices in Sheet (A) have to be transformed into returns. All covariances in Sheet (B) to
be used without any transformations.
[2.6] The eventual dataset including both Sheets (A) and (B) should be in a daily frequency.
To this end, combine both sheets such that monthly frequency data in Sheet (B) are kept constant and repeated identically within each month versus varying other daily observations in Sheet (A).1
[2.7] When there is a missing or non-numeric entry, apply the necessary cleaning. The description
sheet provides brief information about the variables. No further information about the variables should be needed, however the variable names are identical to those appearing on WRDS and Fred St. Louis (for the purpose of this assignment you are expected to use the dataset but no further variable research is required).
Section 3: Implementations and Estimation Methodology
Implement a code to estimate all three models in equations (1)-(3), on a rolling window basis where w = 25. This indicates that windows include 25 consecutive (working) days as appear in the dataset and rolling forward by one day, amounts to a new window such that each two neighbouring windows overlap for 24 days. For each model, carry out the estimation as described below and store the specified results for w = 1, . . . , W :
[3.1] To estimate the parameter values described earlier for Model A, use the standard OLS method
where only rt _ rf,t and rM,t _ rf,t are included. The estimation results produce {w(A) , w(A)}.
where rt _ rf,t and rM,t _ rf,t are included, in addition to αll of the x-covariates included in
the Appendix (X1). The estimation results produce {w(B) , w(B) , ψ←w }.
produce {w(C) , w(C) , w , w }.
Once stored the three sets of estimation outpus: {w(A) , w(A)}, {w(B) , w(B) , ψ←w }, and {w(C) , w(C) , w , w },
[3.4] Evaluate the following specifications and store the values for { , 1(A)}, { , 1(B)} and { , 1(C)}:
(4)
(5)
(6)
[3.5] Implement a forecasting framework to predict the one-step ahead value of the outcome vari-
ables in equations (4)-(6). To carry out the prediction, use 14-consecutive window values
earlier obtained: {w(A) , w(A)}, {w(B) , w(B)}, and {w(C) , w(C)} within each model, to predict the next-
[3.6] Model A uses a minalistic approach without including any x-covariates, whereas Model C
applies a learning-based method to select a sub-set of x-covariates. Consider an alternative
w(A) , w(A)} as input data in addition
where the additional part incorporates the x-covariates, ν D is the specification error term, the selection set is determined based on the Lasso approach and l denotes the index for selected x-covariates. This specification, first, is based on a minimalistic setup of Model A, leading
to {w(A) , w(A)} while, second, also informs the secondary regression in (7) through including
are already the outcomes of the rolling windows, you will need to set up the new dataset
such that each pair w(A) and w(A) is considered together with the last value of the x-covariates.
row of dataset required for the estimation of expression (7). Similarly, window 2 uses the
original data from days 2 to 29, inclusive, leading to {w(A)22 , w(A)22 }. Then the x-covariates at
the x-covariate data, within each window w .
Section 4: Results and the Final Report
Your report should provide the numerical results and comments requested below. Please organise your report following the same structure arranged below by including the answers under parts [Q.1]- [Q.5].
[Q.1] Briefly explain the trade-offs associated between the model variance versus bias-squared to
inform model selection. (Concise and less than 200 words) (Mark: 10%)
[Q.2] Report the estimated values for {V, V 1(A)}, {V, V 1(B)} and {V, V 1(C)} and their associated p-
values. (Mark: 15%)
[Q.3] Report the FMSE values associated with part [3.5]. Based on these values, provide com-
ments which of the expressions (4)-(6) is considered most reliable to predict their outcome variables. (Concise and less than 200 words) (Mark: 20%)
[Q.4] Estimate the specification expressed in (7) based on the Lasso method. Report the estimated
values for { , 1(D)} and their p-values. Explain which of the expressions (6) or (7) is
and estimated values. (Concise and less than 350 words) (Mark: 40%)
[Q.5] Construct a diagram for w where the horizontal axis shows w = 1, . . . , W and the vertical
2023-03-30