Tidy time series & forecasting in R

Tidy time series analysis and forecasting

Time series data and patterns

10th June 2024 @ UseR! 2024

Mitchell O’Hara-Wild, Nectric

Useful links

social.mitchelloharawild.com

workshop.nectric.com.au/user2024/

mitchelloharawild/workshop-fable-user2024

🎯 Today’s goals

Learn about time series data.
Visualise common time series patterns.
Produce forecasts from a statistical models.
Evaluate the forecasting accuracy.

🌻 Expectations

Follow the code of conduct.
Ask relevant questions any time, Q&A during breaks.
Be kind and respectful.
Make mistakes and learn!

https://workshop.nectric.com.au/user2024/

Mitchell O’Hara‑Wild

Monash + Nectric

@mitchelloharawild

Welcome, who am I?

🎓 PhD candidate at Monash University
📊 Data consulting and workshops at Nectric
📈 Specialised in time series analysis
📦 Develops R packages (fable, vitae, etc.)
🤖 DIYs IoT devices for home automation
🌱 Permaculturist (🐝, 🐣, 🍄, 🌞)

You!

UseR! attendee

🙋 Hi, who are you?

Hands up if…

🧑‍💻 You’ve used R
📊 You’ve analysed data
🫧 Used tidyverse packages (dplyr, ggplot2…)
📈 You’ve worked with time series data before
🔮 You’ve produced a forecast before
🤩 You’ve used fable!

Tidy time series analysis

This workshop is about tidy time series in R.

We’ll be using these packages!

Install them all with install.packages("fpp3")

Design of the forecast package

Forecasting individual time series
Regular and infrequent observations
(monthly, quarterly or annually)
Point forecasts and intervals
Consistent with ts models

Design of the fable package

Forecasting many time series
Observations at any time
(sub-daily, irregular, monthly, etc.)
Forecast distributions
Consistent with the tidyverse

Tidy time series packages

# Data manipulation
library(dplyr)
# Plotting functions
library(ggplot2)
# Time and date manipulation
library(lubridate)
# Time series class
library(tsibble)
# Tidy time series data
library(tsibbledata)
# Time series graphics and statistics
library(feasts)
# Forecasting functions
library(fable)

# All of the above
library(fpp3)

Time series data

Four-yearly Olympic winning times
Annual Google profits
Quarterly Australian beer production
Monthly pharmaceutical subsidies
Weekly retail sales
Daily COVID-19 infections
Hourly electricity demand
Minutely blood glucose measurements
Time-stamped stock transaction data

Time series data

Comes in all shapes and sizes!

Like all data, we hope it’s tidy 🧹

All time series data contain…

The time of the observation (index)
One or more observations (measurements)

Some datasets have identifying metadata,

Identifying variables for the series (key)

The tsibble data format

A tsibble is a tibble for time series!

The quarterly visitors to Australia are found in the tourism dataset.

library(fpp3)
tourism

# A tsibble: 24,320 x 5 [1Q]
# Key:       Region, State, Purpose [304]
   Quarter Region   State           Purpose  Trips
     <qtr> <chr>    <chr>           <chr>    <dbl>
 1 1998 Q1 Adelaide South Australia Business  135.
 2 1998 Q2 Adelaide South Australia Business  110.
 3 1998 Q3 Adelaide South Australia Business  166.
 4 1998 Q4 Adelaide South Australia Business  127.
 5 1999 Q1 Adelaide South Australia Business  137.
 6 1999 Q2 Adelaide South Australia Business  200.
 7 1999 Q3 Adelaide South Australia Business  169.
 8 1999 Q4 Adelaide South Australia Business  134.
 9 2000 Q1 Adelaide South Australia Business  154.
10 2000 Q2 Adelaide South Australia Business  169.
# ℹ 24,310 more rows

# A tibble: 24,320 × 5
   Quarter Region   State           Purpose  Trips
     <qtr> <chr>    <chr>           <chr>    <dbl>
 1 1998 Q1 Adelaide South Australia Business  135.
 2 1998 Q2 Adelaide South Australia Business  110.
 3 1998 Q3 Adelaide South Australia Business  166.
 4 1998 Q4 Adelaide South Australia Business  127.
 5 1999 Q1 Adelaide South Australia Business  137.
 6 1999 Q2 Adelaide South Australia Business  200.
 7 1999 Q3 Adelaide South Australia Business  169.
 8 1999 Q4 Adelaide South Australia Business  134.
 9 2000 Q1 Adelaide South Australia Business  154.
10 2000 Q2 Adelaide South Australia Business  169.
# ℹ 24,310 more rows

Column types

In this dataset, the column types are:

index variable: Quarter
key variable(s): Region, State, and Purpose
measured variable(s): Trips

pedestrian

# A tibble: 66,037 × 5
   Sensor         Date_Time           Date        Time Count
   <chr>          <dttm>              <date>     <int> <int>
 1 Birrarung Marr 2015-01-01 00:00:00 2015-01-01     0  1630
 2 Birrarung Marr 2015-01-01 01:00:00 2015-01-01     1   826
 3 Birrarung Marr 2015-01-01 02:00:00 2015-01-01     2   567
 4 Birrarung Marr 2015-01-01 03:00:00 2015-01-01     3   264
 5 Birrarung Marr 2015-01-01 04:00:00 2015-01-01     4   139
 6 Birrarung Marr 2015-01-01 05:00:00 2015-01-01     5    77
 7 Birrarung Marr 2015-01-01 06:00:00 2015-01-01     6    44
 8 Birrarung Marr 2015-01-01 07:00:00 2015-01-01     7    56
 9 Birrarung Marr 2015-01-01 08:00:00 2015-01-01     8   113
10 Birrarung Marr 2015-01-01 09:00:00 2015-01-01     9   166
# ℹ 66,027 more rows

Exercise 1

In this dataset, which columns are:

index variable(s)?
key variable(s)?
measured variable(s)?

Our first tsibble

A tsibble is a time series tibble.

It is created with as_tsibble().

tourism |> 
  as_tsibble(
    key = c(Region, State, Purpose),
    index = Quarter
  )

# A tsibble: 24,320 x 5 [1Q]
# Key:       Region, State, Purpose [304]
   Quarter Region   State           Purpose  Trips
     <qtr> <chr>    <chr>           <chr>    <dbl>
 1 1998 Q1 Adelaide South Australia Business  135.
 2 1998 Q2 Adelaide South Australia Business  110.
 3 1998 Q3 Adelaide South Australia Business  166.
 4 1998 Q4 Adelaide South Australia Business  127.
 5 1999 Q1 Adelaide South Australia Business  137.
 6 1999 Q2 Adelaide South Australia Business  200.
 7 1999 Q3 Adelaide South Australia Business  169.
 8 1999 Q4 Adelaide South Australia Business  134.
 9 2000 Q1 Adelaide South Australia Business  154.
10 2000 Q2 Adelaide South Australia Business  169.
# ℹ 24,310 more rows

Our first tsibble

Exercise 2

Identify the index and key variables of the aus_accommodation dataset.

Then use as_tsibble() to convert it into a tsibble.

library(fpp3)
read.csv(
  "https://workshop.nectric.com.au/user2024/data/aus_accommodation.csv"
)

Representing time (the index)

Time is surprisingly tricky to represent!

frequency
granularity
time zones
calendars
leap years (and seconds?!)
holidays
civil/absolute time
time periods

Representing time (the index)

Common time index variables can be created with these functions:

Granularity	Function
Annual	`start:end`
Quarterly	`yearquarter()`
Monthly	`yearmonth()`
Weekly	`yearweek()`
Daily	`as_date()`, `ymd()`
Sub-daily	`as_datetime()`

Creating a tsibble

Tidy the data into a long format
Appropriately class the time variable
Convert to tsibble with as_tibble(), identifying the index and key variable(s).

Follow along

Let’s convert the tourism dataset into a tsibble.

library(fpp3)
read.csv(
  "https://workshop.nectric.com.au/user2024/data/tourism.csv"
)

Creating a tsibble

Tidy the data into a long format
Appropriately class the time variable
Convert to tsibble with as_tibble(), identifying the index and key variable(s).

Exercise 3

Redo the previous exercise, but this time use the appropriate time class for the index variable.

You should see the frequency of the tsibble matches the quarterly frequency of the measurements.

Manipulating time series data

Tidy time series data uses tidyverse tools!

Exercise 4

Try calculating the total tourists visiting Victoria from the tourism dataset.

Hint: use filter() to keep only visitors to “Victoria”, then summarise() to calculate the total trips.

Manipulating time series data

However there are some differences when working with time…

The time index is always preserved (implicit group_by()).

Grouping across time

Sometimes you want to summarise over time.

Summarising over all of time is no longer a time series, so first convert to a tibble with as_tibble().

To re-index by a new variable, use index_by().

Visualising time series

Time series visualisation helps us identify time series patterns.

These patterns include:

trends
seasonalities
cycles
covariates
events

They also help identify anomalies/outliers.

:::

Time plots

The most common time series graphic is the “time plot”, created with autoplot().

pbs_scripts <- PBS |>
  summarise(Scripts = sum(Scripts))
pbs_scripts |>
  autoplot(Scripts)

:::

Seasonal plots

The gg_season() plots help identify peaks and troughs in seasonal patterns.

pbs_scripts |>
  gg_season(Scripts)

:::

Seasonal sub-series plots

The gg_subseries() plots help identify changes in seasonal patterns.

pbs_scripts |>
  gg_subseries(Scripts)

:::

ACF plots

The ACF() |> autoplot() plots show autocorrelations, helping to identify trends, seasons and cycles.

pbs_scripts |>
  ACF(Scripts) |> 
  autoplot()

:::

Visualising time series

Exercise 5

Visualise and describe the temporal patterns of visitors to Victoria in the tourism dataset.

Here’s some code to get you started:

vic_tourism <- tourism |> 
  filter(State == "Victoria") |> 
  summarise(Trips = sum(Trips))
vic_tourism |> 
  autoplot(Trips)

Seasonal or Cyclical?

These common patterns are commonly confused, can you tell them apart?

What’s the difference?

Seasonal patterns have…

Consistent and predictable shape,
Repeats with a fixed time period

Cyclical patterns have…

Inconsistent shape and amplitude,
Repeats with a variable time period

Seasonal or Cyclical?

These common patterns are commonly confused, can you tell them apart?

Seasonal or Cyclical?

These common patterns are commonly confused, can you tell them apart?

Seasonal or Cyclical?

These common patterns are commonly confused, can you tell them apart?

Seasonal or Cyclical?

These common patterns are commonly confused, can you tell them apart?

⏰ Time for a break!

Up next…

Simplifying patterns with transformations,
Specifying time series models,
Training models on data,
Forecasting the future!

Useful links

social.mitchelloharawild.com

workshop.nectric.com.au/user2024/

mitchelloharawild/workshop-fable-user2024

Unsplash credits

Thanks to these Unsplash contributors for their photos

Sander Weeteling: Photo
Balint Mendlik: Photo
John Fowler: Taken near sunset at White Sands National Monumen…
David Pisnoy: Photo
Chris Lee: Behind the leaves.
Jon Tyson: Photo
Randy Fath: Photo
Nathan Dumlao: Photo
Sander Weeteling: Photo