Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STAT603 (Forecasting) – Semester 2 2023

Assignment 2

Outline: The purpose of this assignment to assess your analytical and com- puting skills on the material covered up to Week 12. It requires a substantial amount of work. DO NOT leave it until the last few days.

Total: 100 marks (five questions). This assignment is worth 25% of your final grade.

Due: Tuesday 24th of October 2023, 11am.

Submission:

Submit your assignment as a single PDF file. You can use the R Markdown provided or your preferred PDF math/text editor The preferred option is R Markdown (knit it to PDF).

Submit your assignment through CANVAS as a soft copyfile, including a signed SECMS–Assignment cover sheet (otherwise your as- signment won’t be marked) in the first page. For this, use the submis- sion link avaialble under ‘Assignments’.

The filename must include 1) your lastname, 2) your firstname, and 3) your student id. For instance, if John White submits his assignment, this must be a file with extension .pdf and named “White John 123456789”.

Report/Assignment: Your assignment must be self-contained and self– explanatory. All R code, output, scientific reference, and any other resource required to complete your assignment must be embedded in the document and adequatedly cited.

Page Limit: Maximum number of pages is 20 including grahs, appendices, and any relevant R code.

Data: In Question 1, Question 2 and Question 3 you will use the monthly steroids SALES (anti–inflammatory drug, also known as H03 drugs) in Australia between July 1991 and June 2008.

Filename:   H03_drug_Sales_Australia.csv.

In Question 4, you’ll use the global_economy and aus_livestock datasets (from the fpp3 R package). NO data required in Question 5.

Software: Each computing task involved with this assignment must be car- ried out with R.

Plagiarism: If this is the case for your project, your case will be referred to an appropriate AUT office.

Lateness penalties: Late assignments without and approved extension (or SCA) will be subject to a deduction of one grade (e.g., from C+ to C) out of your total mark for each 24-hr period, or part thereof, for up to a maximum of 3 DAYS. Assignments over three days late will not be marked and you will receive an DNC (Did Not Complete) for this assessment.

Exceptional Circumstances: If your performance and/or your ability to complete this assignment by the due date is seriously affected by exceptional circumstances beyond your control (e.g., injury or illness) you may apply for special consideration, WITH supporting evidence. To apply for special consideration, you must complete the special considertaion form via Canvas (STAT603 Home Page).

Tasks/Questions:

Question 1 – ETS (20 marks) HINT: Read Sections 8.1 – 8.7 of the online book.

Step 0– Scale the data (e.g., divide by 100). From now on, you’ll work with the scaled series (0 marks).

(a) Plot the series and discuss the main features, including station- arity (2 marks).

(b) Forecast the next two years using (1) simple exponential smooth- ing, (2) Holt’s linear trend, and Holt’s (3) damped trend.

Plot the series and the forecasts. Merely based on this plot, discuss the adequacy of these methodologies to forecast from this series. Explain your answer (2 marks – forecasts + 3 marks – discussion; TOTAL = 5 marks)

(c) Repeat Part (b) with Holt-Winters’ seasonal methods. Discuss whether additive or multiplicative seasonality is necessary. Ex- plain your answer (2 marks – forecasts + 3 marks – discussion; TOTAL = 5 marks).

(d) Compare the mean squared error (MSE) and the mean absolute error (MAE) of the one-step-ahead, four-step-ahead and six-step- ahead forecast forecasts from methods discussed in (b)-(c) above. Report your results neatly and clearly. You can use a Table.

Which method has the highest accuracy? Does this selection de- pend on the number of pre–specified (steps–ahead) forecasts? Ex- plain your answer (2 marks – Results clearly presented + 3 marks – discussion; TOTAL = 5 marks).

(e) Briefly discuss the potential mistake/error we may unintentionally introduce in the discussion when comparing models models (b)–(c) using the MSE and MAE. (3 marks)

Question 2 – Stationarity (20 marks) HINT: Read Sections 9.1, 9.2, and 9.5 of the online book.

(a) Plot the autocorrelation function (ACF) and the partial ACF.

(a.1) Briefly discuss the stationarity of the series based on the ACF. Does you answer here conform with your answer to Question 1 – (a)?

(a.2) Should the series be differenced in order to obtain a stationary series? Explain your answer.

(3 marks ACF/PACF + 5 marks discussion; TOTAL = 8 marks).

(b) Find an appropriate Box-Cox transformation and order of differ- encing to obtain stationary data. Note: Justify your answer whatsoever, even if no Box–Cox transformation is re- quired. (2 marks – working + 5 marks – discussion/rationale

(a) + 5 marks – discussion/rationale (b) ; TOTAL = 12 marks).

Question 3 – (30 marks) Seasonal & non–seasonal ARIMA modelling. HINT: Read Sections 9.1, 9.5 and 9.7 of the online book.

(a) By studying the appropriate graphs of the series in R, propose an appropriate ARIMA(p, d, q) or ARIMA(p, d, q)(P, D, Q) structure to model the series. Explain your answer .

Plots/Figures can be included as part of your answer. (3 marks working + 5 marks rationale; TOTAL = 8 marks).

(b) Should a constant be included in the model? Justify your an- swer (2 marks).

HINT: Read sub–section ‘Understanding constants in R’, from Section 9.7. Note that, to fit ARIMA models with and without constant (respectively), we use:

> fit1 <- DATA %>%  model(‘m1’ = ARIMA( Y ~ 1.   )

> fit1 <- DATA %>%  model(‘m1’ = ARIMA( Y ~ 0....) .

(c) Fit the ARIMA model proposed in 3(a) using R functions and ex- amine the residuals. Is the proposed model satisfactory? Explain your answer

(3 marks – working (code) + 5 marks – explanation; TOTAL = 8 marks).

(d) Now, let ARIMA() choose an ARIMA model for this data. Does ARIMA() return the same model as the one you chose in 3(a)? If not, which model do you think suits best? (Explain your answer)

(3 mark– working and code + 5 marks – rationale; TOTAL = 8 marks).

(e) Which method do you think is best between ETS and ARIMA to forecast from this series (compare Q3 and Q1 results !) ? Justify your answer (4 marks – reasons/rationale).

Question 4 – (10 marks) Seasonality and the function accuracy()

(a) Fit a Holt’s Linear (with no damping parameter), a Holt-Winters additive and a Holt-Winters multiplicative model to the New Zealand consumer price index (CPI) from the data set global_economy. Then, compare their in-sample accuracy with the function accuracy().

The output will return a NaN for the Holt-Winters models. Why is this happening? Write down a short paragraph (2–3 sentences) discussing this question (2 marks code + 3 marks discussion; TO- TAL = 5 marks).

Code to extract the data (you will need to write code to estimate the models, and to check the in–sample accuracy measure with accuracy().):

> library("fpp3")

> mydata <- global_economy %>%

filter(Country == "New Zealand")

(b) Now, repeat this analysis with the number of pigs slaughtered in Victoria, available in the dataset aus_livestock. Did you observe any warnings (or NaN) from Holt–Winters? Why did you get no errors as opposed to (a)? Briefly explain your answer

- compare to 4(a)

Code to extract the data. You will need to write code to fit the required model and to check the in–sample accuracy measure with accuracy().

> myseries <- aus_livestock %>%

filter(Animal == "Pigs", State == "Victoria")

> myseries %>% autoplot(Count)

(2 marks code + 3 marks discussion; TOTAL = 5 marks).

Question 5 – (20 marks) Select the correct answer and explain as re- quested (2 marks correct answer + 3 marks rationale = 5 marks/question; TOTAL = 20 marks).

(a) In general, prediction intervals from the ARIMA models increase as the forecast horizon increases.

HINT: Read Section 9.8 of the online book.

TRUE

FALSE

Explain your answer.

(b) The AICc cannot be used to compare between ARIMA and ETs models.

HINT: Read Section 9.10 of the online book.

TRUE

FALSE

Explain your answer.

(c) Time series cross-validation can be used to compare between ARIMA and ETs models.

HINT: Read Section 9.10 of the online book, sepcially Section

Comparing ETS() and ARIMA() on non seasonal data.

TRUE

FALSE

Explain your answer.

(d) Read Section 9.10,

subsection Comparing ETS() and ARIMA() on seasonal data. This section compares seasonal ARIMA and ETS models applied to quarterly cement production data.

After a deep analysis, the ETS model was selected for forecasting based on its forecasting performance from the test set.

TRUE

FALSE

Explain your answer.