Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

ECON2209 Assessment

Project

2023

At the start of an R session for this course, remember to type  library(fpp3) in the R Studio Console.  This will then load (most of) the R packages you will need, including some data sets.

Details:

 Total value: 25 marks.

Submission is due on Friday of Week 9 (14 April), 4pm.

A submission link is on the Moodle site under Assessments.

• Submit your answer document in PDF format. Your file name should follow this naming convention:

CP_your first name_zID_your last name_ECON2209.pdf

For example: CP_John_z1234567_Smith_ECON2209.pdf

• You get one opportunity to submit your file. Make sure that you submit the file that you intend to submit.

• Your submitted answers should include the R code that you used.

Format: No longer than 20 pages, including code, figures, tables and any appendices. Do not include a separate title page. At least 11 point font should be used, with adequate margins for comments. Any extra pages will not be marked.

• This project requires you to analyse time series data. The series will differ between students.

• The project is set out as containing three Parts, each with multiple sub-parts. This is mainly to guide you with your analysis of your data series.  It is strongly recommended that you follow the given

sequence in your analysis and in presenting your results.

• Unless approval for an extension is given on medical grounds (supported by a medical certificate submitted through the Special Consideration process) there will be an immediate late penalty of 5% from 4:01pm on 14 April, followed by additional penalities of 5% per calendar day or part thereof. Submissions will not be accepted after 5 days (120 hours) of the original deadline.

Marking for this Project: Marks are not awarded by Part, but by overall achievement against the following criteria:

(a) Suitability of methods. 10 marks:

• 0 marks: Little or no attempt.

• 2 marks: Inappropriate methods used or methods inappropriately implemented.

• 5 mark: An attempt has been made to answer the question using methods that are appropriate and appropriately implemented.

• 7 marks: A reasonable attempt at the questions that generally follow the provided solutions.

• 8.5 marks: Systematic analysis.

• 10 marks: More depth of analysis than asked for.

(b) Interpretation of the results, arguments used and conclusions drawn. 10 marks

• 0 marks: Little or no attempt.

• 2 marks: Little attempt to discuss the results, or a poor understanding of the results found.

• 5 marks: An attempt has been made to understand and explain all the results.

• 7 marks: Systematic and sensible discussion of all results.

• 8.5 marks: Discussion of the results seem correct and insightful.

• 10 marks: Insightful discussion beyond what might reasonably be expected, possibly drawing on external references and other research.

(c) Presentation: Appropriate style of graphs, tables, reporting and clarity of writing. 5 marks

• 0 marks: Little or no attempt.

• 1 marks: Difficult to follow what has been done. Small font making graphs and tables hard to read. Lack of clear writing.

• 2 marks: Presentation of results falls short of the standard in the provided solutions for tutorial exercises and problem sets.

• 3 marks: Presentation of results consistent with the standard in the provided solutions for tutorial exercises and problem sets.

• 4 marks: More polished presentation.

• 5 marks: Professional style report. Tables can still be in R output format - reformatting not required.

Maximum marks:  25

Note that criteria (b) and (c) together comprise 60% of the overall mark for the project.

Select the data series that you will analyse

Forecasting official statistics is very common in business and government. In this project you will use data from the Australian Bureau of Statistics (ABS). Specifically, you will use data on components of Gross Domestic Product measured by the expenditure approach, taken from the Australian National Accounts: ABS Catalogue 5206.0, Table 2. Expenditure on Gross Domestic Product (GDP), Chain volume measures. The data series you will use will be in the form of a quantity index (“chain volume measure”), with the values expressed in 2020-21 fiscal-year dollars.

We can download the Excel spreadsheet from the ABS website, or we can use the R package readabs to read in the data, as follows.

library(readabs)

expdata  <-  read_abs("5206 .0" ,  tables= "2" ,  check_local=FALSE)  %>%

mutate(Quarter  =  yearquarter  (date))  %>%

as_tsibble(

index  =  Quarter,

key  =  c  (series_id)

)

Keep only the volume series, dropping a few data series in the full data set that we are not interested in modelling (e.g. the “Statistical Discrepancy” and aggregate GDP) and things that are very tricky to forecasting (e.g. changes in inventories).

expdata_vol  <-  expdata    %>%  filter(series_type  ==  "Original")    %>%

filter( ! (`series_id`  %in%  c ( "A2302522F" ,  "A2302459A" ,  "A2302515J" , "A2302516K" ,  "A2302517L" ,  "A2302518R" ,  "A2302491A"))

)

You must use the following method for selecting your data series.

Use the seven digits of your UNSW student ID to get the data series that you will analyse in this project, as in the following example for the case when your student ID is z1234567:

set .seed(1234567)

myseries  <-  expdata_vol  %>%

filter  (`series_id`  ==  sample(expdata_vol$ `series_id`  ,  1),  year(Quarter)>= 1990)

Note while sample() takes a random sample, using the same seed”through set .seed() will result in the same series being selected each time you run the code on the same computer.

The ABS spreadsheet includes the official seasonally adjusted series. This can be extracted for your data series as follows:

myseries_sa  <-  expdata  %>%

filter(series_type  ==  "Seasonally  Adjusted")  %>%

filter(series  == myseries$series[1],  year(Quarter)>=1990)

Make a note of the IDs of your series, in case you run into computer problems and need to retrieve the series manually:

myseries$series_id[1]

myseries_sa$series_id[1]

Part 1: Data Exploration and Transformation

Plot your volume and seasonally adjusted volume data together using the following code:

myseries  %>%

autoplot(value)  +

autolayer(myseries_sa,  .vars  =  value,  colour="red")  +

labs(y  =  "2020-21  $  millions" ,

title  =myseries$series[ 1],

subtitle  =  "Original  (black)  and  Seasonally  Adjusted  (red)")

a. Based on the plot, discuss characteristics of each series.

b. What Box-Cox transformation, if any, would you select for your (non-seasonally adjusted) data? Explain.

Part 2: Time Series Decomposition

a. Consider the last twenty years of your untransformed data. Use an STL decomposition and produce a standard decomposition plot showing the trend-cycle, seasonal and remainder components. Discuss what you find from the decomposition plot.

b. Then plot your seasonally adjusted data from the STL decomposition together with the official seasonally adjusted data for the last twenty years. What observations can you make about the respective series?

Part 3: ARIMA Modelling

For this Part, use your full (non-seasonally adjusted) data series, after a Box-Cox transformation if necessary.

a. Using the visual inspection of plots, find the appropriate order of differencing to obtain stationary data. Explain your choices, step-by-step.

b. Use statistical tests to check your choices in part a.

c. Select an appropriate ARIMA model. Explain your choice and report the results.

d. Create a training dataset (myseries_train) consisting of observations before 2020. Check that that your data have been split appropriately by producing a plot of myseries_train

and myseries in one figure.

e. Using the training data set, consider the following models:

• The ARIMA model you selected in part c.

• An STL decomposition, followed by an ARIMA model on the seasonally adjusted data; that is, an STL-ARIMA model.

• An ETS model chosen automatically.

Using the test data set, plot the forecasts from all three models on the same figure along with the actual data from 2005 onwards. Include the prediction intervals and discuss the relative performance of the models based on the figure, and on the RMSE and MAPE.

f. Propose and implement your own choice of alternative model. Discuss your choice and its performance relative to the best model from part e.

g. Now create a new training dataset (myseries_trnew) consisting of observations before 2010 and repeat the analysis in parts d. to f., plotting the forecasts from all four models on the same figure along with the actual data from 2005 onwards. Include the prediction intervals. What observations do you have about the sensitivity of the accuracy measures and forecasts to the amount of training data used?