Tidy time series analysis and forecasting

Time series data and patterns

10th June 2024 @ UseR! 2024

Mitchell O’Hara-Wild, Nectric

🎯 Today’s goals

  1. Learn about time series data.
  2. Visualise common time series patterns.
  3. Produce forecasts from a statistical models.
  4. Evaluate the forecasting accuracy.

🌻 Expectations

  1. Follow the code of conduct.
  2. Ask relevant questions any time, Q&A during breaks.
  3. Be kind and respectful.
  4. Make mistakes and learn!

https://workshop.nectric.com.au/user2024/

Mitchell O’Hara‑Wild

Monash + Nectric

@mitchelloharawild

@mitchelloharawild

Welcome, who am I?

  • 🎓 PhD candidate at Monash University
  • 📊 Data consulting and workshops at Nectric
  • 📈 Specialised in time series analysis
  • 📦 Develops R packages (fable, vitae, etc.)
  • 🤖 DIYs IoT devices for home automation
  • 🌱 Permaculturist (🐝, 🐣, 🍄, 🌞)

You!

UseR! attendee

🙋 Hi, who are you?

Hands up if…

  • 🧑‍💻 You’ve used R
  • 📊 You’ve analysed data
  • 🫧 Used tidyverse packages (dplyr, ggplot2…)
  • 📈 You’ve worked with time series data before
  • 🔮 You’ve produced a forecast before
  • 🤩 You’ve used fable!

Tidy time series analysis

This workshop is about tidy time series in R.

We’ll be using these packages!

Install them all with install.packages("fpp3")

Design of the forecast package

  • Forecasting individual time series
  • Regular and infrequent observations
    (monthly, quarterly or annually)
  • Point forecasts and intervals
  • Consistent with ts models

Design of the fable package

  • Forecasting many time series
  • Observations at any time
    (sub-daily, irregular, monthly, etc.)
  • Forecast distributions
  • Consistent with the tidyverse

Tidy time series packages

# Data manipulation
library(dplyr)
# Plotting functions
library(ggplot2)
# Time and date manipulation
library(lubridate)
# Time series class
library(tsibble)
# Tidy time series data
library(tsibbledata)
# Time series graphics and statistics
library(feasts)
# Forecasting functions
library(fable)

# All of the above
library(fpp3)

Time series data

  • Four-yearly Olympic winning times
  • Annual Google profits
  • Quarterly Australian beer production
  • Monthly pharmaceutical subsidies
  • Weekly retail sales
  • Daily COVID-19 infections
  • Hourly electricity demand
  • Minutely blood glucose measurements
  • Time-stamped stock transaction data

Time series data

Comes in all shapes and sizes!

Like all data, we hope it’s tidy 🧹

All time series data contain…

  • The time of the observation (index)
  • One or more observations (measurements)

Some datasets have identifying metadata,

  • Identifying variables for the series (key)

The tsibble data format

A tsibble is a tibble for time series!

The quarterly visitors to Australia are found in the tourism dataset.

library(fpp3)
tourism
# A tsibble: 24,320 x 5 [1Q]
# Key:       Region, State, Purpose [304]
   Quarter Region   State           Purpose  Trips
     <qtr> <chr>    <chr>           <chr>    <dbl>
 1 1998 Q1 Adelaide South Australia Business  135.
 2 1998 Q2 Adelaide South Australia Business  110.
 3 1998 Q3 Adelaide South Australia Business  166.
 4 1998 Q4 Adelaide South Australia Business  127.
 5 1999 Q1 Adelaide South Australia Business  137.
 6 1999 Q2 Adelaide South Australia Business  200.
 7 1999 Q3 Adelaide South Australia Business  169.
 8 1999 Q4 Adelaide South Australia Business  134.
 9 2000 Q1 Adelaide South Australia Business  154.
10 2000 Q2 Adelaide South Australia Business  169.
# ℹ 24,310 more rows

# A tibble: 24,320 × 5
   Quarter Region   State           Purpose  Trips
     <qtr> <chr>    <chr>           <chr>    <dbl>
 1 1998 Q1 Adelaide South Australia Business  135.
 2 1998 Q2 Adelaide South Australia Business  110.
 3 1998 Q3 Adelaide South Australia Business  166.
 4 1998 Q4 Adelaide South Australia Business  127.
 5 1999 Q1 Adelaide South Australia Business  137.
 6 1999 Q2 Adelaide South Australia Business  200.
 7 1999 Q3 Adelaide South Australia Business  169.
 8 1999 Q4 Adelaide South Australia Business  134.
 9 2000 Q1 Adelaide South Australia Business  154.
10 2000 Q2 Adelaide South Australia Business  169.
# ℹ 24,310 more rows

Column types

In this dataset, the column types are:

  • index variable: Quarter
  • key variable(s): Region, State, and Purpose
  • measured variable(s): Trips

pedestrian
# A tibble: 66,037 × 5
   Sensor         Date_Time           Date        Time Count
   <chr>          <dttm>              <date>     <int> <int>
 1 Birrarung Marr 2015-01-01 00:00:00 2015-01-01     0  1630
 2 Birrarung Marr 2015-01-01 01:00:00 2015-01-01     1   826
 3 Birrarung Marr 2015-01-01 02:00:00 2015-01-01     2   567
 4 Birrarung Marr 2015-01-01 03:00:00 2015-01-01     3   264
 5 Birrarung Marr 2015-01-01 04:00:00 2015-01-01     4   139
 6 Birrarung Marr 2015-01-01 05:00:00 2015-01-01     5    77
 7 Birrarung Marr 2015-01-01 06:00:00 2015-01-01     6    44
 8 Birrarung Marr 2015-01-01 07:00:00 2015-01-01     7    56
 9 Birrarung Marr 2015-01-01 08:00:00 2015-01-01     8   113
10 Birrarung Marr 2015-01-01 09:00:00 2015-01-01     9   166
# ℹ 66,027 more rows

Exercise 1

In this dataset, which columns are:

  • index variable(s)?
  • key variable(s)?
  • measured variable(s)?

Our first tsibble

A tsibble is a time series tibble.

It is created with as_tsibble().

tourism |> 
  as_tsibble(
    key = c(Region, State, Purpose),
    index = Quarter
  )
# A tsibble: 24,320 x 5 [1Q]
# Key:       Region, State, Purpose [304]
   Quarter Region   State           Purpose  Trips
     <qtr> <chr>    <chr>           <chr>    <dbl>
 1 1998 Q1 Adelaide South Australia Business  135.
 2 1998 Q2 Adelaide South Australia Business  110.
 3 1998 Q3 Adelaide South Australia Business  166.
 4 1998 Q4 Adelaide South Australia Business  127.
 5 1999 Q1 Adelaide South Australia Business  137.
 6 1999 Q2 Adelaide South Australia Business  200.
 7 1999 Q3 Adelaide South Australia Business  169.
 8 1999 Q4 Adelaide South Australia Business  134.
 9 2000 Q1 Adelaide South Australia Business  154.
10 2000 Q2 Adelaide South Australia Business  169.
# ℹ 24,310 more rows

Our first tsibble

Exercise 2

Identify the index and key variables of the aus_accommodation dataset.

Then use as_tsibble() to convert it into a tsibble.

library(fpp3)
read.csv(
  "https://workshop.nectric.com.au/user2024/data/aus_accommodation.csv"
)

Representing time (the index)

Time is surprisingly tricky to represent!

  • frequency
  • granularity
  • time zones
  • calendars
  • leap years (and seconds?!)
  • holidays
  • civil/absolute time
  • time periods

Representing time (the index)

Common time index variables can be created with these functions:

Granularity Function
Annual start:end
Quarterly yearquarter()
Monthly yearmonth()
Weekly yearweek()
Daily as_date(), ymd()
Sub-daily as_datetime()

Creating a tsibble

  1. Tidy the data into a long format
  2. Appropriately class the time variable
  3. Convert to tsibble with as_tibble(), identifying the index and key variable(s).

Follow along

Let’s convert the tourism dataset into a tsibble.

library(fpp3)
read.csv(
  "https://workshop.nectric.com.au/user2024/data/tourism.csv"
)

Creating a tsibble

  1. Tidy the data into a long format
  2. Appropriately class the time variable
  3. Convert to tsibble with as_tibble(), identifying the index and key variable(s).

Exercise 3

Redo the previous exercise, but this time use the appropriate time class for the index variable.

You should see the frequency of the tsibble matches the quarterly frequency of the measurements.

Manipulating time series data

Tidy time series data uses tidyverse tools!

Exercise 4

Try calculating the total tourists visiting Victoria from the tourism dataset.

Hint: use filter() to keep only visitors to “Victoria”, then summarise() to calculate the total trips.

Manipulating time series data

However there are some differences when working with time…

The time index is always preserved (implicit group_by()).

Grouping across time

Sometimes you want to summarise over time.

Summarising over all of time is no longer a time series, so first convert to a tibble with as_tibble().

To re-index by a new variable, use index_by().

Visualising time series

Time series visualisation helps us identify time series patterns.

These patterns include:

  • trends
  • seasonalities
  • cycles
  • covariates
  • events

They also help identify anomalies/outliers.

:::

Time plots

The most common time series graphic is the “time plot”, created with autoplot().

pbs_scripts <- PBS |>
  summarise(Scripts = sum(Scripts))
pbs_scripts |>
  autoplot(Scripts)

:::

Seasonal plots

The gg_season() plots help identify peaks and troughs in seasonal patterns.

pbs_scripts |>
  gg_season(Scripts)

:::

Seasonal sub-series plots

The gg_subseries() plots help identify changes in seasonal patterns.

pbs_scripts |>
  gg_subseries(Scripts)

:::

ACF plots

The ACF() |> autoplot() plots show autocorrelations, helping to identify trends, seasons and cycles.

pbs_scripts |>
  ACF(Scripts) |> 
  autoplot()

:::

Visualising time series

Exercise 5

Visualise and describe the temporal patterns of visitors to Victoria in the tourism dataset.

Here’s some code to get you started:

vic_tourism <- tourism |> 
  filter(State == "Victoria") |> 
  summarise(Trips = sum(Trips))
vic_tourism |> 
  autoplot(Trips)

Seasonal or Cyclical?

These common patterns are commonly confused, can you tell them apart?

What’s the difference?

Seasonal patterns have…

  • Consistent and predictable shape,
  • Repeats with a fixed time period

Cyclical patterns have…

  • Inconsistent shape and amplitude,
  • Repeats with a variable time period

Seasonal or Cyclical?

These common patterns are commonly confused, can you tell them apart?

Seasonal or Cyclical?

These common patterns are commonly confused, can you tell them apart?

Seasonal or Cyclical?

These common patterns are commonly confused, can you tell them apart?

Seasonal or Cyclical?

These common patterns are commonly confused, can you tell them apart?

Time for a break!

Up next…

  • Simplifying patterns with transformations,
  • Specifying time series models,
  • Training models on data,
  • Forecasting the future!

Unsplash credits

Thanks to these Unsplash contributors for their photos