Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

QBUS6840

Semester 1, 2022

Mid-Semester Examination Practice Questions with Solutions

Q1

The file beer.txt contains two named columns ‘Time’ (month) and Sales’ . Read the following Python code:

[1]

import pandas as pd

[2]

import matplotlib.pyplot as plt

[3]

import numpy as np

[4]       beer = pd.read_csv('beer.txt')

[5]       sales = beer['Sales']

[6]       sales_v = sales.values

[7]       plt.figure()

[7] plt.plot(sales)

[8] plt.title('Beer Sales')

[9] plt.xlabel('Month')

[10]            plt.ylabel('Sales')

[11]            myT = sales_v.rolling(12, center=True).mean()

[12]

T = sales.rolling(2, center = True).mean()

[13]

T1 = T.rolling(12, center = True).mean();

[14]            What_is_this = beer['Sales'].ewm(alpha = 0.05,

adjust=True).mean()

[15]

plt.figure()

[16]

plt.plot(sales)

[17]

plt.plot(T1)

[18]

plt.plot(What_is_this)

[19]

plt.title('Beer Sales')

[20]

plt.xlabel('Month')

[21]

plt.ylabel('Sales')

Please answer the following questions:

(i)   Explain the meaning of statement [4] in the program.  What is the data type for variable beer?

We use this statement to call pandas’ read_csv to read the data in a csv file or a txt file into Python environment for any further processing.

The data information will be stored in the variable beer, whose type is DataFrame. All the data information in beer.txt is now contained in the DataFrame variable beer.

(ii)   What is the data type of the variable sales in statement [5]?  What is the difference

between sales and sales_v from statement [6].

According to the information regarding the beer.txt file, we know the DataFrame beer    variable has two columns. In line [5], python takes the column identified by the column name ‘Sales’ from the DataFrame beer, and the information is stored in the new variable sales whose type is (pandas’) Series.

Statement [6] instructs python to pick up all the sales values from Series sales and store only values in sales_v excluding the Series title “Sales” and index information etc. In   fact, sales_v type is array.

(iii)   Do you have any concerns about this program?  At which line would you expect an error

message when running this program or not? Why?

There is a small bug in this program. Executing the program will stop at line [11]. In this line, python is trying to look for the rolling function for the numpy array variable sales_v which does not exist. Python will throw out a message showing no rolling function.

(iv)   How to handle the issue you identified in (iii)?

Two ways to fix the issue. The simplest one is to remove line [11] from the program.  Or first convert the array to a panda Series then call its rolling function

myT = pd.Series(sales_v, dtype='float64').rolling(12,

center=True).mean()

(v)   Write out the mathematical formula to replicate T from sales in statement [12].

= 1 +

2

(vi)   Explain the meaning of T1 in statement [13]. How many number of values/data have we

lost on the both ends of T1. Explain the reason for this.

T1 is a variable containing the smoothed time series by applying the centred MA- 12       (CMA- 12) on the given sales time series. Theoretically we would expect there are 6       missing values on the both ends of the smoothed time series. However due to python’s   implementation, there will be 7 missing values (NaN) at the beginning of T1 and other 5 missing values in the end.

The reason why we lost these values is that we can only apply the CMA- 12 from the time t=7 up to t = N-6 where N is the length of the given time series.

(vii)   Explain the meaning of statement [14].

With statement [13] we ask python to conduct the simple exponential smoothing with alpha = 0.05.

(viii)   Roughly sketch the figure given by statements [14]-[20]?  You may assume a monthly

seasonal plot for the beer sales.

You draw them please

Q2

Easy to answer using the lecture notes

Q3 Which forecasting method forms a forecast by weighting the most recent data (in time) more highly than less recent data and how?

The simple exponential smoothing method actually forms its forecast by weighting the most recent data more highly than less recent data.   This is calculated according to the following

formula

+1  = + (1 − )−1  + ⋯ + (1 − )−11

because  it  is  usually  0 < < 1 , hence for example, > (1 − ) > (1 − )2  > ⋯ > (1 − )−1 .

Q4 Describe/List the main differences between qualitative and quantitative forecasting methods.

Qualitative forecasting is based on judgmental forecasts. It is more subjective and it is also opinion related. Forecasting using judgement is very common in practice. There are many cases where judgmental forecasting is the only option, such as when there is a complete lack of historical data.

Quantitative forecasting is data-based and relies on statistical approaches to make forecast. If it is doable, in general, the results are more objective. The success of this type of approaches heavily relies on the quality of historical data and appropriate modeling methods/algorithms.

Q5 Which measure of forecast accuracy should be used in the following situations? Give their definition in formulas.

(i)        A small number of large forecast errors may be allowed.

If we don't care about the existence of several number of large forecast errors, we shall use the MAD measure which is average distance between actual and forecast. The MAD is      defined as

+ℎ

= ∑  | |

=+1

(ii)       Errors need to be measured in percentage terms.

In this case, we shall use Mean Absolute Percentage Error (MAPE). It measures errors again the actual data in terms of percentage. As the error is relative or normalized, hence it is unit independent.  MAPE is defined as

= × 100

MAPE cannot be used if = 0.

(iii)      Whether a model’s forecasts are biased is more important than the typical size of errors.

In this case, we are concerned with whether the following condition is satisfied

+ℎ

∑  ( )  ≈ 0

=+1

If this condition is satisfied, the forecast is unbiased. However even this condition is

satisfied, the magnitude of errors could be very large.

Q6 Consider the following time series plot showing annual labour force data in Australia.


(i)  Describe the main features or components apparent in the data


There  is  an  obvious  up-trend  in  linear  way.  There  is  no  obvious  cycle  component. Although the graph is cluttered, we can still see annual seasonality present in the data.

The seasonal magnitude does not proportionally change along with the time.


Also there is sharp drop around time range from 155 to 165 (these numbers are just some estimates. If this question is in exam, I may only ask you to identify the pattern at a rough location), then it resumes the main trend.


(ii) List  three  quantitative  forecast  models  that  might  be  tried  for  this  data,  with  brief motivation for each choice.

Given the patterns identified in (i), the following methods can be tried:


1)  Additive  decomposition model:    This  method  is  capable  of picking  up  all  the components identified in (i). In this method, the trend can be estimated by linear regression with spike dummy variables at time points 155 – 165.

2)   Simple exponential smoothing: As there exists visible seasonality, we shall use a large alpha parameter value to reveal (the seasonal component when smoothing). However this method can only provide a constant prediction. Similarly the Holt’s linear (trend corrected exponential) smoothing can be applied too. The forecast can be linear.

3)  Holt-Winters Seasonal Additive Model: This model will be more accurate as the seasonality, trend component, level component can be time-dependent.

Q7 Describe the classical multiplicative decomposition method by showing each major step.

I simply copied the procedure from the teaching slide. You may need add more details to show your full understanding.

1:  Smooth the data to remove seasonality, leading to the initial trend-cycle estimate. 2:  De-trend the original series

3: Estimate the seasonal indexes by averaging and do normalization such that the sum of seasonal indices is M

4: If forecasting, fit a trend model such as linear regression to the seasonally adjusted time series

5: Estimate the cycle-error

6: Smooth the cycle-error to estimate the cycle

7: Estimate the errors by removing the cycle, seasonal and trend components.

Q8

For students to answer

Q9

(i)   Find the weights, showing all working, for the following centred smoothers MA-6

As 6 is an even number, we first work on two half-time MA6 by

Then do a normal MA2 on the half-time smoothed series

(ii)   Indicate how many missing observations there will be at the start and end of each

smoothed series

On each side we will lose 3 data. This is because we have −3 and +3  in the final smoothing formula.

(iii)   Identify the smoother in (i) as a WMA-k. What is k?

Centred MA-6 uses 7 data to get one smoothed data at time t and the sum of all

coefficients {1/12, 1/6, 1/6, 1/6, 1/6, 1/6, 1/12} is 1.  Hence CMA-6 is WMA-k where k = 7.

(iv)   Devise a symmetric smoother that is an WMA 5 so that the weight on time t is 1.5 times

the weight at times t- 1 and t+1; and double the weight at times t-2, t+2

This question is slightly harder:

A WMA5 looks like

From the given conditions we have

Or

As it is a WMA5, hence

This gives

The final symmetric WMA5 smoother is

Q10 The Holts linear additive model/trend corrected exponential model is given by the equations:

= + (1 − )(−1  + −1)

= γ( −1) + (1 − γ)−1

+1  = + + +1; +1 ~ (0, 2 )


(i)        Put this model in error correction form.

You may refer to slides of Lecture 5 for the derivation. Here is a quick solution:

First, the third equation in the above model shows +1  is in the error form.  At the time t, the equation becomes

(*)

We will need this new equation.  Let us re-write the first equation by the following way

In the last step, we have used equation (*) above.  This has put the level in the error form.

For the convenience of the next step, we further re-write the level’s error form as

(**)

Now let us re-write the second equation in the model by

= (−1) + (1 − )−1

= ( −1) + −1 −1

= −1 + ( −1 −1)

= −1 + (∗∗)

This makes all three variables in the error forms.

(ii)       Derive the  1, 2 and 3 step-ahead forecasts from this model. Explain briefly why this

model is called the ‘local-linear’ forecasting model.

In (i), we have built the three error forms. That is

= −1 + −1 +

= −1 +

+1  = + + +1

The general way to derive forecasting is to express Y in terms of at different time points. Then take the expectation. For example, from the last equation we have

+1  = (+1 |1:) = ( + + +1 |1:) = + + (+1 |1:) According to the model assumption, (+1 |1:) = (+1) = 0. Hence

+1  = (+1 |1:) = +