Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STATS 326

SEMESTER 1, 2022

STATISTICS

Applied Time Series Analysis

Mid-Semester Test

1  Run the following code in R.

# Use your student ID as the seed

set.seed(2022)

sample(letters[1:5], 2, replace = FALSE)

Use the output from the above R code to select the sub-questions you need to answer from the list below. For example, suppose the result for the above code is "d" and "c", then you should select sub-questions "d" and "c" from the list below to answer this question.

Note:  Please make sure to replace the seed used in the above R code with your student ID to select the sub-questions that you need to answer in this question. The marks will not be given if you do not answer the questions allocated to you based on your seed.

a  Explain what each line of the following R code does and why you have obtained two different outputs by adding dyears(1) and years(1).

date <- ymd("2020-01-15")

date + dyears(1)

date + years(1)

b  Explain what each line of the following R code does and how you can construct out1 from out3.

out1 <- ymd_hms("2022-06-12 11:30:15", tz = "Pacific/Auckland")

out2 <- as_date(out1)

out3 <- as_datetime(out2)

out1 == out3

c  Suppose   z is   a   tsibble containing   data   from   2000-01-01 12 AM to 2002-12-31 11 PM. The first few rows of z are shown below.

z %>% slice_head(n = 5)

## # A tsibble: 5 x 2 [1h] <UTC>

##  Time               Values

##   <dttm>              <dbl>

## 1 2000-01-01 00:00:00  1.02

## 2 2000-01-01 01:00:00  0.198

## 3 2000-01-01 02:00:00  0.910

## 4 2000-01-01 03:00:00  1.66

## 5 2000-01-01 04:00:00 -0.249

Explain what the following R code does and the changes you may expect in the output.

z %>% force_tz(Time, tzone = "Pacific/Auckland")

d  Suppose enroll contains student enrollment and staff recruitment details from 2000–2020 for three departments.   The Student's Gender is recorded as a binary variable. The first few rows of the data frame are given below.             enroll %>% head(n = 5)

##  Year Department Student's Gender Enrolled Staff

## 1 2000 Statistics          Female     1958   32

## 2 2000 Statistics            Male     1964   36

## 3 2001 Statistics          Female     1985    27

## 4 2001 Statistics            Male     1822   35

## 5 2002 Statistics          Female    1792   37

Explain what the following R code does and what additional information you get to observe in the output?

enroll %>% as_tsibble(key = c(Department, `Student's Gender`),

index = Year)

e  Suppose temp contains half-hourly data for three years. The first few rows are shown below.

temp %>% slice_head(n = 5)

## # A tsibble: 5 x 2 [30m] <UTC>

##  Time               Temperature

##   <dttm>                   <dbl>

## 1 2000-01-01 00:00:00        15.6

## 2 2000-01-01 00:30:00        18.5

## 3 2000-01-01 01:00:00       23.3

## 4 2000-01-01 01:30:00       20.4

## 5 2000-01-01 02:00:00       22.0

Explain what the following R code does.

temp %>%

mutate(x = as_date(floor_date(Time, "bimonth"))) %>%

index_by(x) %>%

summarise(y = mean(Temperature)) [Total: 15 marks]

2  Figure 1 shows two graphs produced for monthly turnover  (in millions of AUD) from food retailing in Tasmania over 1982 April–2018 December.

Figure 1: Two plots produced for turnover from food retailing in Tasmania

a  Describe what is plotted in each panel of Figure 1 and the features you can observe for this time series. [5 marks]

b  The turnover from food retailing is decomposed into its components using the following R code.

stl_dcmp <- turnover %>%

model(STL(log(Turnover) ~ season(window = 11)))

i  Write down an equation to describe the form of the decomposition per- formed and explain why the above setting has been used. [9 marks]

ii  Comment on what is plotted in all four panels of Figure 2 and the be- haviour of each component over time.

Decomposition

STL(log(Turnover) ~ season(window = 11))

1980 Jan                          1990 Jan                          2000 Jan                          2010 Jan                          2020 Jan


Month

Figure 2: Decomposition of Turnover from food

retailing

in Tasmania

[6 marks]

c  The estimates of the decomposed components for the last 18 months are given below.

stl_dcmp %>% components() %>%

select(-State , -Industry, -.model) %>% slice_tail(n = 18)

## # A tsibble: 18 x 6 [1M]

##      Month `log(Turnover)` trend season_year remainder

##      <mth>          <dbl> <dbl>      <dbl>    <dbl>

##  1 2017 Jul           5.37  5.42    -0.0421   -0.00780

##  2 2017 Aug           5.38  5.43    -0.0385   -0.00951

##  3 2017 Sep           5.38  5.43    -0.0501   -0.00386

##  4 2017 Oct           5.45  5.44    0.00486   0.00718

##  5 2017 Nov           5.48  5.45    0.0170    0.0193

##  6 2017 Dec           5.64  5.45    0.163    0.0181

##  7 2018 Jan           5.51  5.46    0.0579   -0.00730

##  8 2018 Feb           5.41  5.47    -0.0332   -0.0194

##  9 2018 Mar           5.53  5.47    0.0487    0.00506

## 10 2018 Apr           5.44  5.48    -0.0218   -0.0170

## 11 2018 May           5.46  5.49    -0.0290    0.00680

## 12 2018 Jun           5.41  5.49    -0.0776   -0.00407

## 13 2018 Jul           5.47  5.50    -0.0421    0.00993

## 14 2018 Aug           5.48  5.51    -0.0384    0.0165

## 15 2018 Sep           5.48  5.51    -0.0498    0.0159

## 16 2018 Oct           5.51  5.52    0.00491  -0.0108

## 17 2018 Nov           5.53  5.52    0.0172   -0.0116

## 18 2018 Dec           5.69  5.53    0.164    -0.00368

## # ... with 1 more variable: season_adjust <dbl>

A random walk with a drift model is fitted to the seasonally adjusted data. The details of the fitted model and forecasts calculated are given below.

stl_dcmp %>%

components() %>%

model(drift = RW(season_adjust ~ drift())) %>% report()

## Series: season_adjust

## Model: RW w/ drift

##

## Drift: 0.0049 (se: 0.0016)

## sigma^2: 0.0012

stl_dcmp %>%

components() %>%

model(drift = RW(season_adjust ~ drift())) %>%

select(-.model) %>%

forecast(h = 6)

## # A fable: 6 x 6 [1M]

## # Key:    State, Industry, .model [1]

##   State    Industry     .model   Month

##   <chr>    <chr>       <chr>    <mth>

## 1 Tasmania Food retail~ drift  2019 Jan

## 2 Tasmania Food retail~ drift  2019 Feb

## 3 Tasmania Food retail~ drift  2019 Mar

## 4 Tasmania Food retail~ drift  2019 Apr

## 5 Tasmania Food retail~ drift  2019 May

## 6 Tasmania Food retail~ drift 2019 Jun

Calculate 1-step-ahead

i  forecast median for turnover.             ii  95% prediction interval for turnover.

Give your answers to 1 decimal place.

season_adjust .mean

<dist> <dbl>

N(5.5, 0.0012) 5.53

N(5.5, 0.0023) 5.54

N(5.5, 0.0035) 5.54

N(5.5, 0.0047) 5.55

N(5.6, 0.0059) 5.55

N(5.6, 0.0071) 5.56

[3 marks] [5 marks]

[Total: 28 marks]

3  Figure 3 shows the monthly total cost for anti-diabetic drugs in Australia from 1991 July–2008 June.

30

20

10

2005 Jan

Figure 3: Monthly total cost for anti-diabetic drugs in Australia

The structure of the tsibble object used to plot Figure 3 is given below. a10 %>% glimpse()

## Rows: 204

## Columns: 2

## $ Month <mth> 1991 Jul, 1991 Aug, 1991 Sep, 1991 Oct, 1991~

## $ Cost <dbl> 3.53, 3.18, 3.25, 3.61, 3.57, 4.31, 5.09, 2.~

a  Write R code to create a tsibble containing 68 rows of quarterly total cost for anti-diabetic drugs in Australia from  1991 Q3–2008 Q2.  Label this new tsibble object as a10q. [4 marks]

b  Suppose we have fitted a time series regression model to the information con- tained in a10q as given below.

fit <- a10q %>%

model(lm = TSLM(log(Cost) ~ trend() + season()))

fit %>% report()

## Series: Cost

## Model: TSLM

## Transformation: log(Cost)

##

## Residuals:

##    Min      1Q Median     3Q    Max

## -0.1045 -0.0202  0.0061  0.0280  0.0842

##

## Coefficients:

##               Estimate Std. Error t value Pr(> |t|)

## (Intercept)   2.347470  0.013830 169.74 < 2e-16 *** ## trend()       0.028065  0.000263 106.63 < 2e-16 *** ## season()year2 -0.093151  0.014590  -6.38 2.3e-08 *** ## season()year3 -0.007184  0.014597  -0.49    0.62   ## season()year4 0.117485  0.014590   8.05 2.9e-11 *** ## ---

## Signif. codes:

## 0 '***' 0.001 '**' 0.01 '*' 0.05 '. ' 0.1 ' ' 1

##

## Residual standard error: 0.0425 on 63 degrees of freedom

## Multiple R-squared: 0.995,  Adjusted R-squared: 0.994

## F-statistic: 2.88e+03 on 4 and 63 DF, p-value: <2e-16

i  Write down the fitted regression model and interpret the trend coeicient. [7 marks]

ii  Write an R code to perform the Ljung-Box test to assess the adequacy of the fitted model. [7 marks]

iii  State the null and alternative hypothesis for the Ljung-Box test and de- scribe how you would use the output above to reach a conclusion. [4 marks]

iv  Calculate 1-step-ahead forecast median for the total cost from this fitted model.  Give your answer to 1 decimal place. [5 marks] [Total: 27 marks]