# A tibble: 66,037 Γ 5
Sensor Date_Time Date Time Count
<chr> <dttm> <date> <int> <int>
1 Birrarung Marr 2015-01-01 00:00:00 2015-01-01 0 1630
2 Birrarung Marr 2015-01-01 01:00:00 2015-01-01 1 826
3 Birrarung Marr 2015-01-01 02:00:00 2015-01-01 2 567
4 Birrarung Marr 2015-01-01 03:00:00 2015-01-01 3 264
5 Birrarung Marr 2015-01-01 04:00:00 2015-01-01 4 139
6 Birrarung Marr 2015-01-01 05:00:00 2015-01-01 5 77
7 Birrarung Marr 2015-01-01 06:00:00 2015-01-01 6 44
8 Birrarung Marr 2015-01-01 07:00:00 2015-01-01 7 56
9 Birrarung Marr 2015-01-01 08:00:00 2015-01-01 8 113
10 Birrarung Marr 2015-01-01 09:00:00 2015-01-01 9 166
# βΉ 66,027 more rows
Exercises
Time series data and patterns
Exercise 1
The pedestrian
dataset contains hourly pedestrian counts from 2015-01-01 to 2016-12-31 at 4 sensors in the city of Melbourne.
The data is shown below:
Identify the index
variable, key
variable(s), and measured variable(s) of this dataset.
- The
index
variable contains the complete time information - The
key
variable(s) identify each time series - The measured variable(s) are what you want to explore/forecast.
index
variable
key
variable(s)
measured variable(s)
Exercise 2
The aus_accommodation
dataset contains quarterly data on Australian tourist accommodation from short-term non-residential accommodation with 15 or more rooms, 1998 Q1 - 2016 Q2.
The units of the measured variables are as follows:
- Takings are in millions of Australian dollars
- Occupancy is a percentage of rooms occupied
- CPI is an index with value 100 in 2012 Q1.
Complete the code to convert this dataset into a tsibble.
Exercise 3
The previous exercise produced a dataset with daily frequency - although clearly the data is quarterly! This is because we are using a daily granularity which is inappropriate for this data.
Common temporal granularities can be created with these functions:
Granularity | Function |
---|---|
Annual | as.integer() |
Quarterly | yearquarter() |
Monthly | yearmonth() |
Weekly | yearweek() |
Daily | as_date() , ymd() |
Sub-daily | as_datetime() |
Use the appropriate granularity for the aus_accommodation
dataset, and verify that the frequency is now quarterly.
Exercise 4
The tourism
dataset contains the quarterly overnight trips from 1998 Q1 to 2016 Q4 across Australia.
It is disaggregated by 3 key variables:
State
: States and territories of AustraliaRegion
: The tourism regions are formed through the aggregation of Statistical Local Areas (SLAs) which are defined by the various State and Territory tourism authorities according to their research and marketing needsPurpose
: Stopover purpose of visit: βHolidayβ, βVisiting friends and relativesβ, βBusinessβ, βOther reasonβ.
Calculate the total quarterly tourists visiting Victoria from the tourism
dataset.
To achieve this we will use functions from dplyr.
Use
filter()
to keep only data where theState
is"Victoria"
.Use
summarise()
to calculate the total trips in Victoria, regardless ofRegion
andPurpose
.
If you finish early, try also using group_by()
to find the quarterly trips to Victoria for each Purpose
.
Exercise 5
Visualise and describe the temporal patterns of visitors to Victoria in the tourism
dataset.
Use autoplot()
to produce a time plot of the visitors to Victoria, and describe the temporal patterns.
Overall trend
Seasonality
Cycles
Use gg_season()
to take a closer look at the shape of the seasonal pattern.
At what time of year is the seasonal peak and trough?
Seasonal peak
The seasonal maximum is:
Seasonal trough
The seasonal minimum is:
Use gg_subseries()
to see if the seasonal shape changes over time.
Changing seasonality
Is the seasonal pattern changing over time?
Use ACF() |> autoplot()
to look at the autocorrelations.
Can you identify the trend and seasonality in this plot?
If you have extra time, repeat the above time series exploration for each Purpose
of travel to Victoria. Do the patterns vary by purpose of travel?
Modelling and forecasting
Exercise 6
Earlier we used visualisation to identify temporal patterns with visitors to Victoria in the tourism
dataset. Now weβll use this to specify, estimate, and forecast the data!
Specify all simple forecasting models for the total number of tourists arriving in Victoria.
Estimate them with model()
and produce forecasts for the next 5 years with forecast()
.
Plot the forecasts, and visually evaluate their suitability.
Which simple forecasting model is most appropriate for this data?
If you finish early, try also producing forecasts of the quarterly trips to Victoria for each Purpose
.
Exercise 7
Produce forecasts for total Takings of Australian tourist accommodation over the next 5 years from aus_accommodation
using linear regression.
Try producing forecasts from the regression model which also uses Occupancy
as a regressor.
Why doesnβt this work?
Exercise 8
An ETS model can capture a wide range of time series patterns, and most usefully it can adapt to changes in these patterns over time.
Additive components have constant variance, while multiplicative components have variation proportionate to the level/scale of the data.
Is the seasonality of total Australian accommodation takings from aus_accommodation_total
additive or multiplicative?
Estimate an ETS model for the data, does the automatic ETS model match the patterns you see in a time plot?
Identify the nature of each of the ETS components.
Error
Trend
Seasonality
Exercise 9
An ARIMA model captures a many time series pattern using autocorrelations. It requires data with a constant variance, so transformations might be necessary.
We can use log()
to transform multiplicative patterns into additive ones. This is done inside the model specification, applied to the response variable. For exxample: ARIMA(log(y))
.
Multiplicative patterns arenβt always exactly multiplicative - for this we often use power transformations via box_cox(y, lambda)
. More information: https://otexts.com/fpp3/transformations.html
Identify if a transformation is necessary for the aus_accommodation_total
Takings
Then estimate an automatically selected ARIMA model for this data.
Compare ARIMA forecasts with the automatic ETS model, how do they differ?
The ARIMA and ETS forecasts areβ¦
Accuracy evaluation
Exercise 10
Which model is better, ETS or ARIMA? It depends on the data!
Letβs see which model works best for forecasting total Takings of Australian tourist accommodation.
Estimate ETS and ARIMA models for total Takings of Australian tourist accommodation (aus_accommodation_total
), then use accuracy()
to find their in-sample accuracy.
Which model is most accurate?
Between ARIMA and ETS, which model is most accurate for this data?
Exercise 11
In-sample forecast accuracy is unrealistic - the model has seen the future!
Produce out-of-sample forecasts to evaluate which model is the most accurate.
Evaluate ETS and ARIMA forecast accuracy for total Takings of Australian tourist accommodation (aus_accommodation_total
).
- Withhold 4 years of data for forecast evaluation,
- Estimate ETS and ARIMA models on the filtered data,
- Produce forecasts for the 4 years of withheld data,
- Evaluate forecast accuracy using
accuracy()
.
Which model is more accurate for forecasting?
Does this differ from the in-sample model accuracy?
Between ARIMA and ETS, which model is most accurate for this data based on the test set accuracy?
Exercise 12
Evaluating out-of-sample forecasts on a small test-set is highly sensitive to just a few observations.
Letβs use cross-validation to get a reliable estimate of forecast accuracy.
Calculate cross-validated forecast accuracy for total Takings of Australian tourist accommodation (aus_accommodation_total
). Use an initial fold size of 10 years, and increment the length of data by 4 years in each fold.
How do these results differ from the forecast accuracy calculated earlier?
Hint: Use stretch_tsibble()
after filter()
to create cross-validation folds of the data.
Between ARIMA and ETS, which model is most accurate for this data based on the cross-validated accuracy?
Exercise 13
Accuracy provides useful comparison between models and indicative forecasting performance.
Residual diagnostics reveals opportunities to improve your model, and indicates the statistical appropriateness of your model.
Letβs check the model assumptions for our ETS and ARIMA models on total Australian accommodation Takings.