QBUS6840 Semester 1, 2022 Mid-Semester Examination Practice Questions with Solutions
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
QBUS6840
Semester 1, 2022
Mid-Semester Examination Practice Questions with Solutions
Q1
The file beer.txt contains two named columns ‘Time’ (month) and ‘Sales’ . Read the following Python code:
[1] |
import pandas as pd |
[2] |
import matplotlib.pyplot as plt |
[3] |
import numpy as np |
[4] beer = pd.read_csv('beer.txt')
[5] sales = beer['Sales']
[6] sales_v = sales.values
[7] plt.figure()
[7] plt.plot(sales)
[8] plt.title('Beer Sales')
[9] plt.xlabel('Month')
[10] plt.ylabel('Sales')
[11] myT = sales_v.rolling(12, center=True).mean()
[12] |
T = sales.rolling(2, center = True).mean() |
[13] |
T1 = T.rolling(12, center = True).mean(); |
[14] What_is_this = beer['Sales'].ewm(alpha = 0.05,
adjust=True).mean()
[15] |
plt.figure() |
[16] |
plt.plot(sales) |
[17] |
plt.plot(T1) |
[18] |
plt.plot(What_is_this) |
[19] |
plt.title('Beer Sales') |
[20] |
plt.xlabel('Month') |
[21] |
plt.ylabel('Sales') |
Please answer the following questions:
(i) Explain the meaning of statement [4] in the program. What is the data type for variable beer?
We use this statement to call pandas’ read_csv to read the data in a csv file or a txt file into Python environment for any further processing.
The data information will be stored in the variable beer, whose type is DataFrame. All the data information in beer.txt is now contained in the DataFrame variable beer.
(ii) What is the data type of the variable sales in statement [5]? What is the difference
between sales and sales_v from statement [6].
According to the information regarding the beer.txt file, we know the DataFrame beer variable has two columns. In line [5], python takes the column identified by the column name ‘Sales’ from the DataFrame beer, and the information is stored in the new variable sales whose type is (pandas’) Series.
Statement [6] instructs python to pick up all the sales values from Series sales and store only values in sales_v excluding the Series title “Sales” and index information etc. In fact, sales_v type is array.
(iii) Do you have any concerns about this program? At which line would you expect an error
message when running this program or not? Why?
There is a small bug in this program. Executing the program will stop at line [11]. In this line, python is trying to look for the rolling function for the numpy array variable sales_v which does not exist. Python will throw out a message showing no rolling function.
(iv) How to handle the issue you identified in (iii)?
Two ways to fix the issue. The simplest one is to remove line [11] from the program. Or first convert the array to a panda Series then call its rolling function
myT = pd.Series(sales_v, dtype='float64').rolling(12,
center=True).mean()
(v) Write out the mathematical formula to replicate T from sales in statement [12].
= −1 +
2
(vi) Explain the meaning of T1 in statement [13]. How many number of values/data have we
lost on the both ends of T1. Explain the reason for this.
T1 is a variable containing the smoothed time series by applying the centred MA- 12 (CMA- 12) on the given sales time series. Theoretically we would expect there are 6 missing values on the both ends of the smoothed time series. However due to python’s implementation, there will be 7 missing values (NaN) at the beginning of T1 and other 5 missing values in the end.
The reason why we lost these values is that we can only apply the CMA- 12 from the time t=7 up to t = N-6 where N is the length of the given time series.
(vii) Explain the meaning of statement [14].
With statement [13] we ask python to conduct the simple exponential smoothing with alpha = 0.05.
(viii) Roughly sketch the figure given by statements [14]-[20]? You may assume a monthly
seasonal plot for the beer sales.
You draw them please
Q2
Easy to answer using the lecture notes
Q3 Which forecasting method forms a forecast by weighting the most recent data (in time) more highly than less recent data and how?
The simple exponential smoothing method actually forms its forecast by weighting the most recent data more highly than less recent data. This is calculated according to the following
formula
+1 = + (1 − )−1 + ⋯ + (1 − )−11
because it is usually 0 < < 1 , hence for example, > (1 − ) > (1 − )2 > ⋯ > (1 − )−1 .
Q4 Describe/List the main differences between qualitative and quantitative forecasting methods.
Qualitative forecasting is based on judgmental forecasts. It is more subjective and it is also opinion related. Forecasting using judgement is very common in practice. There are many cases where judgmental forecasting is the only option, such as when there is a complete lack of historical data.
Quantitative forecasting is data-based and relies on statistical approaches to make forecast. If it is doable, in general, the results are more objective. The success of this type of approaches heavily relies on the quality of historical data and appropriate modeling methods/algorithms.
Q5 Which measure of forecast accuracy should be used in the following situations? Give their definition in formulas.
(i) A small number of large forecast errors may be allowed.
If we don't care about the existence of several number of large forecast errors, we shall use the MAD measure which is average distance between actual and forecast. The MAD is defined as
+ℎ
= ∑ | − |
=+1
(ii) Errors need to be measured in percentage terms.
In this case, we shall use Mean Absolute Percentage Error (MAPE). It measures errors again the actual data in terms of percentage. As the error is relative or normalized, hence it is unit independent. MAPE is defined as
= × 100
MAPE cannot be used if = 0.
(iii) Whether a model’s forecasts are biased is more important than the typical size of errors.
In this case, we are concerned with whether the following condition is satisfied
+ℎ
∑ ( − ) ≈ 0
=+1
If this condition is satisfied, the forecast is unbiased. However even this condition is
satisfied, the magnitude of errors could be very large.
Q6 Consider the following time series plot showing annual labour force data in Australia.
(i) Describe the main features or components apparent in the data
There is an obvious up-trend in linear way. There is no obvious cycle component. Although the graph is cluttered, we can still see annual seasonality present in the data.
The seasonal magnitude does not proportionally change along with the time.
Also there is sharp drop around time range from 155 to 165 (these numbers are just some estimates. If this question is in exam, I may only ask you to identify the pattern at a rough location), then it resumes the main trend.
(ii) List three quantitative forecast models that might be tried for this data, with brief motivation for each choice.
Given the patterns identified in (i), the following methods can be tried:
1) Additive decomposition model: This method is capable of picking up all the components identified in (i). In this method, the trend can be estimated by linear regression with spike dummy variables at time points 155 – 165.
2) Simple exponential smoothing: As there exists visible seasonality, we shall use a large alpha parameter value to reveal (the seasonal component when smoothing). However this method can only provide a constant prediction. Similarly the Holt’s linear (trend corrected exponential) smoothing can be applied too. The forecast can be linear.
3) Holt-Winters Seasonal Additive Model: This model will be more accurate as the seasonality, trend component, level component can be time-dependent.
Q7 Describe the classical multiplicative decomposition method by showing each major step.
I simply copied the procedure from the teaching slide. You may need add more details to show your full understanding.
1: Smooth the data to remove seasonality, leading to the initial trend-cycle estimate. 2: De-trend the original series
3: Estimate the seasonal indexes by averaging and do normalization such that the sum of seasonal indices is M
4: If forecasting, fit a trend model such as linear regression to the seasonally adjusted time series
5: Estimate the cycle-error
6: Smooth the cycle-error to estimate the cycle
7: Estimate the errors by removing the cycle, seasonal and trend components.
Q8
For students to answer
Q9
(i) Find the weights, showing all working, for the following centred smoothers MA-6
As 6 is an even number, we first work on two half-time MA6 by
Then do a normal MA2 on the half-time smoothed series
(ii) Indicate how many missing observations there will be at the start and end of each
smoothed series
On each side we will lose 3 data. This is because we have −3 and +3 in the final smoothing formula.
(iii) Identify the smoother in (i) as a WMA-k. What is k?
Centred MA-6 uses 7 data to get one smoothed data at time t and the sum of all
coefficients {1/12, 1/6, 1/6, 1/6, 1/6, 1/6, 1/12} is 1. Hence CMA-6 is WMA-k where k = 7.
(iv) Devise a symmetric smoother that is an WMA 5 so that the weight on time t is 1.5 times
the weight at times t- 1 and t+1; and double the weight at times t-2, t+2
This question is slightly harder:
A WMA5 looks like
From the given conditions we have
Or
As it is a WMA5, hence
This gives
The final symmetric WMA5 smoother is
Q10 The Holt’s linear additive model/trend corrected exponential model is given by the equations:
= + (1 − )(−1 + −1)
= γ( − −1) + (1 − γ)−1
+1 = + + +1; +1 ~ (0, 2 )
(i) Put this model in error correction form.
You may refer to slides of Lecture 5 for the derivation. Here is a quick solution:
First, the third equation in the above model shows +1 is in the error form. At the time t, the equation becomes
(*)
We will need this new equation. Let us re-write the first equation by the following way
In the last step, we have used equation (*) above. This has put the level in the error form.
For the convenience of the next step, we further re-write the level’s error form as
(**)
Now let us re-write the second equation in the model by
= ( − −1) + (1 − )−1
= ( − −1) + −1 − −1
= −1 + ( − −1 − −1)
= −1 + (∗∗)
This makes all three variables in the error forms.
(ii) Derive the 1, 2 and 3 step-ahead forecasts from this model. Explain briefly why this
model is called the ‘local-linear’ forecasting model.
In (i), we have built the three error forms. That is
= −1 + −1 +
= −1 +
+1 = + + +1
The general way to derive forecasting is to express Y in terms of at different time points. Then take the expectation. For example, from the last equation we have
+1 = (+1 |1:) = ( + + +1 |1:) = + + (+1 |1:) According to the model assumption, (+1 |1:) = (+1) = 0. Hence
+1 = (+1 |1:) = +
2022-05-16