This is a tutorial on how to do time series analyses in R using INLA, and visualize the results using ggplot2. The basis for the code is Alain Zuur and colleagues’ new book.

Introduction

Time series analyses are useful for fitting models to data with temporal dependece, or temporal autocorrelation, where the values close together in time are more similar than values that are far apart in time. For this tutorial I am using the greatLakes dataset from the DAAG package. It is a dataset of yearly averages of Great Lake (Erie, Michigan/Huron, Ontario and St Clair) heights from 1918 - 2009. The heights are stored as a multivariate time series.

I won’t be spending any time doing any model selection in this tutorial, nor any model evalution.

Packages

First, we load a few packages.

Setting up the data

Now, we can load in the data.

And visualize it.

For this analysis, we need to convert the data into a dataframe. Here’s some code to do this. I have included how to return it to a time series dataframe too.

So, because we have time series data at regular intervals, we can use a random walk model of order 2 (RW2). According to Zuur and colleagues, these models produce a smoother trend than RW1 models.

Let’s run the model!

For this first run, I will run the model for a single lake; lake Erie.

We can look at the parameter estimates of the model, using ‘summary’. We can also plot the fitted smoother, and then the predicted trend along with the data. For simplicity, these are done in base R, but could be easily run in ggplot2.

We can also run a model that will fit a single trend for all of the lakes. This may be a good initial model, although, realistically each lake will probably have a slightly different trend. We will set some prior parameter values (these can be played with to generate smoother or rougher fits). And then we can run the model.

We also need to change the data to long format. I’ll do that using ‘melt’.

We can now visualize the trend for each lake in ggplot2.

Notice that the same trend is fitted though each lake (in a separate panel). There are clearly places where the trend is not fitting all of the data in each lake.

We can try to improve the model fit by allowing separate trends for each lake. We fit a very similar model, but have to parameterize it differently (so that each lake is treated separately).

Now we can visualize the trends.

We can compare the model with a single trend to the model with separate trend, using DIC.

We can see that the model fitting the same trend to each lake (i2) has a better (lower) DIC score.

That’s if for this tutorial on time series analysis.

Thanks for reading!

~Tim

Reference

Zuur AF, Ieno EN, Saveliev AA. 2017. Beginner’s guide to spatial, temporal and spatial-temporal ecological data analysis with R-INLA, volume I: using GLM and GLMM_. Highland Statistic Ltd; Newburgh, UK.

Contact Me

Please check out my personal website at timothyemoore.com