Tidy time series analysis and forecasting
Accuracy evaluation
10th June 2024 @ UseR! 2024
Inspecting model errors
Accurate models have small errors.
Good models capture all patterns in the data.
Model errors contain patterns not captured by the model!
# A tsibble: 160 x 6 [1Q]
# Key: .model [2]
.model Quarter Trips .fitted .resid .innov
<chr> <qtr> <dbl> <dbl> <dbl> <dbl>
1 ETS(Trips) 1998 Q1 6010. 5457. 554. 0.101
2 ETS(Trips) 1998 Q2 4795. 4824. -28.6 -0.00594
3 ETS(Trips) 1998 Q3 4317. 4370. -52.8 -0.0121
4 ETS(Trips) 1998 Q4 4675. 4841. -167. -0.0344
5 ETS(Trips) 1999 Q1 5304. 5599. -295. -0.0526
6 ETS(Trips) 1999 Q2 4562. 4545. 16.8 0.00369
7 ETS(Trips) 1999 Q3 3784. 4139. -356. -0.0859
8 ETS(Trips) 1999 Q4 4201. 4395. -194. -0.0441
9 ETS(Trips) 2000 Q1 5567. 5055. 512. 0.101
10 ETS(Trips) 2000 Q2 4502. 4468. 33.8 0.00757
# ℹ 150 more rows
Fitted values and residuals
.fitted
.resid
.innov
Response residuals are often used to calculate accuracy ‘measures’.
Common accuracy measures can be calculated with accuracy()
.
These point forecast accuracy measures are:
ME: Mean error (indicates bias)
RMSE: Root mean squared error
(forecast mean accuracy)
MAE: Mean absolute error
(forecast median accuracy)
MPE/MAPE: Percentage errors (problematic, instead use…)
MASE: Mean absolute scaled error
(scaled median accuracy)
Exercise 10
Evaluate ETS and ARIMA forecast accuracy for total Takings of Australian tourist accommodation (aus_accommodation_total
).
Evaluating out-of-sample forecasts gives more realistic forecast accuracy results.
For this, we create a training dataset which withholds data for evaluating accuracy.
Then, we produce forecasts from the model that overlap with the withheld ‘test’ data.
Then we can again use accuracy()
with our forecasts:
# A tibble: 2 × 10
.model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ARIMA(log(Trips)) Test 604. 756. 626. 9.47 9.88 2.53 2.31 0.438
2 ETS(Trips) Test 571. 725. 596. 8.94 9.43 2.41 2.22 0.432
Include the data
Unlike model accuracy, forecasts don’t know what the actual values are.
You need to pass in the full dataset to accuracy()
.
i.e. accuracy(<forecasts>, <full_data>)
.
Exercise 11
Evaluate ETS and ARIMA forecast accuracy for total Takings of Australian tourist accommodation (aus_accommodation_total
).
accuracy()
.Which model is more accurate for forecasting?
Does this differ from the in-sample model accuracy?
Small sample problems!
Test sets in time series can be problematic.
Here we’re judging the best model based on just the most recent 2 years of data.
To overcome this, we can use time-series cross-validation.
This creates many training sets from which we produce many forecasts from different starting points.
The stretch_tsibble()
function can be added after filter()
to create time-series cross-validation folds of data.
Once more, we again use accuracy()
with our forecasts:
# A tibble: 2 × 10
.model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ARIMA(log(Trips)) Test 219. 471. 370. 3.62 6.98 1.49 1.44 0.614
2 ETS(Trips) Test 191. 456. 351. 3.06 6.66 1.42 1.39 0.589
Exercise 12
Calculate cross-validated forecast accuracy for total Takings of Australian tourist accommodation (aus_accommodation_total
).
How do these results differ from the forecast accuracy calculated earlier?
# A tsibble: 160 x 6 [1Q]
# Key: .model [2]
.model Quarter Trips .fitted .resid .innov
<chr> <qtr> <dbl> <dbl> <dbl> <dbl>
1 ETS(Trips) 1998 Q1 6010. 5457. 554. 0.101
2 ETS(Trips) 1998 Q2 4795. 4824. -28.6 -0.00594
3 ETS(Trips) 1998 Q3 4317. 4370. -52.8 -0.0121
4 ETS(Trips) 1998 Q4 4675. 4841. -167. -0.0344
5 ETS(Trips) 1999 Q1 5304. 5599. -295. -0.0526
6 ETS(Trips) 1999 Q2 4562. 4545. 16.8 0.00369
7 ETS(Trips) 1999 Q3 3784. 4139. -356. -0.0859
8 ETS(Trips) 1999 Q4 4201. 4395. -194. -0.0441
9 ETS(Trips) 2000 Q1 5567. 5055. 512. 0.101
10 ETS(Trips) 2000 Q2 4502. 4468. 33.8 0.00757
# ℹ 150 more rows
Recall the ‘innovation’ (model) residuals from augment()
, .innov
.
Innovation residuals
Innovation residuals contain patterns that weren’t captured by the model.
They aren’t so useful for summarising accuracy since they might be transformed.
With .innov
, we can use visualisation to look for patterns. Ideally we don’t find any because this means the model captured everything.
Checking assumptions
It is good practice to check model assumptions.
All model’s we’ve seen today assume \(\varepsilon_t \overset{\mathrm{iid}}{\sim} N(0, \sigma^2)\).
To check this, we can use gg_tsresiduals()
.
Learn more about forecasting
Freely available online! https://otexts.com/fpp3/
I appreciate your feedback
Short feedback form: https://feedback.nectric.com.au/pZ26
Thanks to these Unsplash contributors for their photos