Exercises

Time series data and patterns

Exercise 1

The pedestrian dataset contains hourly pedestrian counts from 2015-01-01 to 2016-12-31 at 4 sensors in the city of Melbourne.

The data is shown below:

# A tibble: 66,037 × 5
   Sensor         Date_Time           Date        Time Count
   <chr>          <dttm>              <date>     <int> <int>
 1 Birrarung Marr 2015-01-01 00:00:00 2015-01-01     0  1630
 2 Birrarung Marr 2015-01-01 01:00:00 2015-01-01     1   826
 3 Birrarung Marr 2015-01-01 02:00:00 2015-01-01     2   567
 4 Birrarung Marr 2015-01-01 03:00:00 2015-01-01     3   264
 5 Birrarung Marr 2015-01-01 04:00:00 2015-01-01     4   139
 6 Birrarung Marr 2015-01-01 05:00:00 2015-01-01     5    77
 7 Birrarung Marr 2015-01-01 06:00:00 2015-01-01     6    44
 8 Birrarung Marr 2015-01-01 07:00:00 2015-01-01     7    56
 9 Birrarung Marr 2015-01-01 08:00:00 2015-01-01     8   113
10 Birrarung Marr 2015-01-01 09:00:00 2015-01-01     9   166
# ℹ 66,027 more rows

Your turn!

Identify the index variable, key variable(s), and measured variable(s) of this dataset.

Hint

The index variable contains the complete time information
The key variable(s) identify each time series
The measured variable(s) are what you want to explore/forecast.

`index` variable

`key` variable(s)

measured variable(s)

Exercise 2

The aus_accommodation dataset contains quarterly data on Australian tourist accommodation from short-term non-residential accommodation with 15 or more rooms, 1998 Q1 - 2016 Q2.

The units of the measured variables are as follows:

Takings are in millions of Australian dollars
Occupancy is a percentage of rooms occupied
CPI is an index with value 100 in 2012 Q1.

Your turn!

Complete the code to convert this dataset into a tsibble.

Exercise 3

Temporal granularity

The previous exercise produced a dataset with daily frequency - although clearly the data is quarterly! This is because we are using a daily granularity which is inappropriate for this data.

Common temporal granularities can be created with these functions:

Granularity	Function
Annual	`as.integer()`
Quarterly	`yearquarter()`
Monthly	`yearmonth()`
Weekly	`yearweek()`
Daily	`as_date()`, `ymd()`
Sub-daily	`as_datetime()`

Your turn!

Use the appropriate granularity for the aus_accommodation dataset, and verify that the frequency is now quarterly.

Exercise 4

The tourism dataset contains the quarterly overnight trips from 1998 Q1 to 2016 Q4 across Australia.

It is disaggregated by 3 key variables:

State: States and territories of Australia
Region: The tourism regions are formed through the aggregation of Statistical Local Areas (SLAs) which are defined by the various State and Territory tourism authorities according to their research and marketing needs
Purpose: Stopover purpose of visit: “Holiday”, “Visiting friends and relatives”, “Business”, “Other reason”.

Your turn!

Calculate the total quarterly tourists visiting Victoria from the tourism dataset.

Tidy tools

To achieve this we will use functions from dplyr.

Use filter() to keep only data where the State is "Victoria".
Use summarise() to calculate the total trips in Victoria, regardless of Region and Purpose.

Bonus task

If you finish early, try also using group_by() to find the quarterly trips to Victoria for each Purpose.

Exercise 5

Visualise and describe the temporal patterns of visitors to Victoria in the tourism dataset.

Your turn

Use autoplot() to produce a time plot of the visitors to Victoria, and describe the temporal patterns.

Overall trend

Seasonality

Cycles

Your turn

Use gg_season() to take a closer look at the shape of the seasonal pattern.

At what time of year is the seasonal peak and trough?

Seasonal peak

The seasonal maximum is:

Seasonal trough

The seasonal minimum is:

Your turn

Use gg_subseries() to see if the seasonal shape changes over time.

Changing seasonality

Is the seasonal pattern changing over time?

Your turn

Use ACF() |> autoplot() to look at the autocorrelations.

Can you identify the trend and seasonality in this plot?

Bonus task

If you have extra time, repeat the above time series exploration for each Purpose of travel to Victoria. Do the patterns vary by purpose of travel?

Modelling and forecasting

Exercise 6

Earlier we used visualisation to identify temporal patterns with visitors to Victoria in the tourism dataset. Now we’ll use this to specify, estimate, and forecast the data!

Your turn!

Specify all simple forecasting models for the total number of tourists arriving in Victoria.

Estimate them with model() and produce forecasts for the next 5 years with forecast().

Plot the forecasts, and visually evaluate their suitability.

Which simple forecasting model is most appropriate for this data?

Bonus task

If you finish early, try also producing forecasts of the quarterly trips to Victoria for each Purpose.

Exercise 7

Your turn!

Produce forecasts for total Takings of Australian tourist accommodation over the next 5 years from aus_accommodation using linear regression.

Bonus task

Try producing forecasts from the regression model which also uses Occupancy as a regressor.

Why doesn’t this work?

Exercise 8

An ETS model can capture a wide range of time series patterns, and most usefully it can adapt to changes in these patterns over time.

Additive or multiplicative?

Additive components have constant variance, while multiplicative components have variation proportionate to the level/scale of the data.

Your turn!

Is the seasonality of total Australian accommodation takings from aus_accommodation_total additive or multiplicative?

Estimate an ETS model for the data, does the automatic ETS model match the patterns you see in a time plot?

Identify the nature of each of the ETS components.

Error

Trend

Seasonality

Exercise 9

An ARIMA model captures a many time series pattern using autocorrelations. It requires data with a constant variance, so transformations might be necessary.

Transforming the data

We can use log() to transform multiplicative patterns into additive ones. This is done inside the model specification, applied to the response variable. For exxample: ARIMA(log(y)).

Multiplicative patterns aren’t always exactly multiplicative - for this we often use power transformations via box_cox(y, lambda). More information: https://otexts.com/fpp3/transformations.html

Your turn!

Identify if a transformation is necessary for the aus_accommodation_total Takings

Then estimate an automatically selected ARIMA model for this data.

Compare ARIMA forecasts with the automatic ETS model, how do they differ?

The ARIMA and ETS forecasts are…

Accuracy evaluation

Exercise 10

Which model is better, ETS or ARIMA? It depends on the data!

Your turn!

Let’s see which model works best for forecasting total Takings of Australian tourist accommodation.

Estimate ETS and ARIMA models for total Takings of Australian tourist accommodation (aus_accommodation_total), then use accuracy() to find their in-sample accuracy.

Which model is most accurate?

Between ARIMA and ETS, which model is most accurate for this data?

Exercise 11

In-sample forecast accuracy is unrealistic - the model has seen the future!

Produce out-of-sample forecasts to evaluate which model is the most accurate.

Your turn!

Evaluate ETS and ARIMA forecast accuracy for total Takings of Australian tourist accommodation (aus_accommodation_total).

Withhold 4 years of data for forecast evaluation,
Estimate ETS and ARIMA models on the filtered data,
Produce forecasts for the 4 years of withheld data,
Evaluate forecast accuracy using accuracy().

Which model is more accurate for forecasting?

Does this differ from the in-sample model accuracy?

Between ARIMA and ETS, which model is most accurate for this data based on the test set accuracy?

Exercise 12

Evaluating out-of-sample forecasts on a small test-set is highly sensitive to just a few observations.

Let’s use cross-validation to get a reliable estimate of forecast accuracy.

Your turn!

Calculate cross-validated forecast accuracy for total Takings of Australian tourist accommodation (aus_accommodation_total). Use an initial fold size of 10 years, and increment the length of data by 4 years in each fold.

How do these results differ from the forecast accuracy calculated earlier?

Hint: Use stretch_tsibble() after filter() to create cross-validation folds of the data.

Between ARIMA and ETS, which model is most accurate for this data based on the cross-validated accuracy?

Exercise 13

Accuracy provides useful comparison between models and indicative forecasting performance.

Residual diagnostics reveals opportunities to improve your model, and indicates the statistical appropriateness of your model.

Follow along

Let’s check the model assumptions for our ETS and ARIMA models on total Australian accommodation Takings.

Time series data and patterns

Exercise 1

index variable

key variable(s)

measured variable(s)

Exercise 2

Exercise 3

Exercise 4

Exercise 5

Modelling and forecasting

Exercise 6

Exercise 7

Exercise 8

Exercise 9

Accuracy evaluation

Exercise 10

Exercise 11

Exercise 12

Exercise 13

`index` variable

`key` variable(s)