Skip to contents

Setting things up

Make a shortcut to a nicer colour scale:

colsc <- function(...) {
  scale_fill_gradientn(
    colours = rev(RColorBrewer::brewer.pal(11, "RdYlBu")),
    limits = range(..., na.rm = TRUE)
  )
}

Get the data

data(Poisson2_1D)

Put the count data in cd (just because ‘cd’ is less to type than ‘countdata2’.)

cd <- countdata2

Take a look at the count data.

cd
#>            x count exposure
#> 1   2.319888     9 4.639776
#> 2   6.959664    13 4.639776
#> 3  11.599439    11 4.639776
#> 4  16.239215    22 4.639776
#> 5  20.878991    20 4.639776
#> 6  25.518766    19 4.639776
#> 7  30.158542    16 4.639776
#> 8  34.798318     8 4.639776
#> 9  39.438093     4 4.639776
#> 10 44.077869     4 4.639776
#> 11 48.717645     4 4.639776
ggplot(cd) +
  geom_point(aes(x, y = count)) +
  ylim(0, max(cd$count))

Tip: RStudio > Help > Cheatsheets > Data visualisation with ggplot2 is a useful reference for ggplot2 syntax.

Fitting a Generalised Additive Model (GAM)

If you’re not familiar with GAMs and the syntax of gam don’t worry, the point of this is just to provide something to which we can compare the inlabru model fit.

fit2.gam <- gam(count ~ s(x, k = 10) + offset(log(exposure)),
  family = poisson(),
  data = cd
)

The term s(x,k=10) just specifies that as nonparametric smooth function is to be fitted to the data, with no more than 10 degrees of freedom (df). (The larger the df, the more wiggly the fitted curve (recall from the lecture that this is an effect of how some spline methods are defined, without discretisation dependent penalty); gam selects the ‘best’ df.) Notice the use of offset=. (Refer to slides for an explanation of offset.) The variable exposure in data frame cd is the size of the bin in which each count was made.

You can look at the fitted model using summary( ) as below if you want to, but you do not need to understand this output, or the code that makes the predictions immediately below it if you are not familiar with GAMs.

summary(fit2.gam)

Make a prediction data frame, get predictions and add them to the data frame First make vectors of x-values and associated (equal) exposures:

xs <- seq(0, 55, length = 100)
exposures <- rep(cd$exposure[1], 100)

and put them in a data frame:

dat4pred <- data.frame(x = xs, exposure = exposures)

Then predict

pred2.gam <- predict(fit2.gam, newdata = dat4pred, type = "response")
# add column for prediction in data frame:
dat4pred2 <- cbind(dat4pred, gam = pred2.gam)

Plotting the fit and the data using the ggplot2 commands below should give you the plot shown below

ggplot(dat4pred2) +
  geom_line(aes(x = x, y = gam), lty = 2) +
  ylim(0, max(dat4pred2$gam, cd$count)) +
  geom_point(aes(x = x, y = count), cd)

Fitting an SPDE model with inlabru

Make mesh. To avoid boundary effects in the region of interest, let the mesh extend outside the data range.

x <- seq(-10, 65, by = 1) # this sets mesh points - try others if you like
mesh1D <- fm_mesh_1d(x, boundary = "free")

… and see where the mesh points are:

ggplot() +
  gg(mesh1D)

Using function bru( ) to fit to count data

We need to specify model components and a model formula in order to fit it. This can be done inside the call to bru( ) but that is a bit messy, so we’ll store it in comp first and then pass that to bru( ).

Our response variable in the data frame cd is called count so the model specification needs to have that on the left of the ~. We add an intercept component with + Intercept(1) on the right hand side (all the models we use have intercepts), and because we want to fit a Gaussian random field (GRF), it must have a GRF specification. In inlabru the GRF specification is a function, which allows the GRF to be calculated at any point in space while inlabru is doing its calculations.

The user gets to name the GRF function. The syntax is myname(input, model= ...), where:

  • ‘myname’ is whatever you want to call the GRF (we called it field below);
  • input specifies the coordinates in which the GRF or SPDE ‘lives’. Here we are working in one dimension, and we called that dimension x when we set up the data set.
  • model= designates the type of effect, here an SPDE model object from the INLA function inla.spde2.pcmatern( ), which requires a mesh to be passed to it, so we pass it the 1D mesh that we created above, mesh1D.

For models that only adds the model components, we don’t need to specify the full predictor formula. Instead, we can provide the name of the output to the left of the ~ in the component specification, and “.” on the right hand side, which will cause it to add all components (unless a subset is selected via the include/exclude arguments to like()).

the_spde <- inla.spde2.pcmatern(mesh1D,
  prior.range = c(1, 0.01),
  prior.sigma = c(1, 0.01)
)

comp <- ~ field(x, model = the_spde) + Intercept(1, prec.linear = 1 / 2^2)

fit2.bru <- bru(
  comp,
  like(count ~ .,
    data = cd,
    family = "poisson",
    E = exposure
  )
)

summary(fit2.bru)
#> inlabru version: 2.11.1.9025
#> INLA version: 24.11.17
#> Components:
#> field: main = spde(x), group = exchangeable(1L), replicate = iid(1L), NULL
#> Intercept: main = linear(1), group = exchangeable(1L), replicate = iid(1L), NULL
#> Likelihoods:
#>   Family: 'poisson'
#>     Tag: ''
#>     Data class: 'data.frame'
#>     Response class: 'integer'
#>     Predictor: count ~ .
#>     Used components: effects[field, Intercept], latent[]
#> Time used:
#>     Pre = 0.739, Running = 0.218, Post = 0.118, Total = 1.07 
#> Fixed effects:
#>            mean    sd 0.025quant 0.5quant 0.975quant  mode   kld
#> Intercept 0.978 0.702     -0.008    0.857      3.166 0.812 0.004
#> 
#> Random effects:
#>   Name     Model
#>     field SPDE2 model
#> 
#> Model hyperparameters:
#>                   mean     sd 0.025quant 0.5quant 0.975quant   mode
#> Range for field 35.194 25.648      8.854   28.298    103.363 19.028
#> Stdev for field  0.519  0.163      0.271    0.496      0.905  0.452
#> 
#> Deviance Information Criterion (DIC) ...............: 60.06
#> Deviance Information Criterion (DIC, saturated) ....: 14.41
#> Effective number of parameters .....................: 5.49
#> 
#> Watanabe-Akaike information criterion (WAIC) ...: 58.21
#> Effective number of parameters .................: 2.84
#> 
#> Marginal log-Likelihood:  -36.79 
#>  is computed 
#> Posterior summaries for the linear predictor and the fitted values are computed
#> (Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')

Predict the values at the x points used for mesh (the data argument must be a data frame, see ?predict.bru):

x4pred <- data.frame(x = xs)
pred2.bru <- predict(fit2.bru,
  x4pred,
  x ~ exp(field + Intercept),
  n.samples = 1000
)

Let’s do a plot to compare the fitted model to the true model. The expected counts of the true model are stored in the variable E_nc2 which comes with the dataset Poisson2_1D. For ease of use in plotting with ggplot2 (which needs a data frame), we create a data frame which we call true.lambda, containing x- and y variables as shown below.

Given that inlabru predictions are always on the intensity function scale, do you understand why we divide the count by cd$exposure? (We will in due course allow predictions on the count scale as well.)

true.lambda <- data.frame(x = cd$x, y = E_nc2 / cd$exposure)

These ggplot2 commands should generate the plot shown below. It shows the true intensities as short horizontal blue lines, the observed intensities as black dots, and the fitted intensity function as a red curve, with 95% credible intervals shown as a light red band about the curve.

ggplot() +
  gg(pred2.bru) +
  geom_point(data = cd, aes(x = x, y = count / exposure), cex = 2) +
  geom_point(data = true.lambda, aes(x, y), pch = "_", cex = 9, col = "blue") +
  coord_cartesian(xlim = c(0, 55), ylim = c(0, 6)) +
  xlab("x") +
  ylab("Intensity")

Compare the inlabru fit to the gam fit:

ggplot() +
  gg(pred2.bru) +
  geom_point(data = cd, aes(x = x, y = count / exposure), cex = 2) +
  geom_line(data = dat4pred2, aes(x, gam / exposure), lty = 2) +
  coord_cartesian(xlim = c(0, 55), ylim = c(0, 6)) +
  xlab("x") +
  ylab("Intensity")

Looking at the posterior distributions

We can look at the Intercept posterior using the function plot( ), as below.

plot(fit2.bru, "Intercept")

You have to know that there is a variable called Intercept in order to use this function. To see what fixed effect parameters’ posterior distributions are available to be plotted, you can type

names(fit2.bru$marginals.fixed)

This does not tell you about the SPDE parameters, and if you type

names(fit2.bru$marginals.random)
#> [1] "field"

this just tells you that there is an SPDE in fit2.bru called ‘field’, it does not tell you what the associated parameter names are. The parameters that are used in estimation are cryptic – what we are interested in is the range and variance of the Matern covariance funcion, that are functions of the internal parameters. We can look at the posterior distributions of the range parameter and the log of the variance parameters as follows. (We look at the posterior of the log of the variance because the variance posterior is very skewed and so it is easier to view the log of the variance)

spde.range <- spde.posterior(fit2.bru, "field", what = "range")
spde.logvar <- spde.posterior(fit2.bru, "field", what = "log.variance")

range.plot <- plot(spde.range)
var.plot <- plot(spde.logvar)
multiplot(range.plot, var.plot)

We can look at the posterior distributions of the Matern correlation and covariance functions as follows:

plot(spde.posterior(fit2.bru, "field", what = "matern.correlation"))

plot(spde.posterior(fit2.bru, "field", what = "matern.covariance"))