Skip to contents

Set things up

library(INLA)
library(inlabru)
library(RColorBrewer)
library(ggplot2)
bru_safe_sp(force = TRUE)

Introduction

We are going to fit spatial models to the gorilla data, using factor and continuous explanatory variables in this practical. We will fit one using the factor variable vegetation, the other using the continuous covariate elevation

(Jump to the bottom of the practical if you want to start gently with a 1D example!)

Get the data

data(gorillas, package = "inlabru")

This dataset is a list (see help(gorillas) for details. Extract the objects you need from the list, for convenience:

nests <- gorillas$nests
mesh <- gorillas$mesh
boundary <- gorillas$boundary
gcov <- gorillas$gcov

Factor covariates

Look at the vegetation type, nests and boundary:

ggplot() +
  gg(gcov$vegetation) +
  gg(boundary) +
  gg(nests, color = "white", cex = 0.5) +
  coord_equal()

Or, with the mesh:

ggplot() +
  gg(gcov$vegetation) +
  gg(mesh) +
  gg(boundary) +
  gg(nests, color = "white", cex = 0.5) +
  coord_equal()

A model with vegetation type only

It seems that vegetation type might be a good predictor because nearly all the nests fall in vegetation type Primary. So we construct a model with vegetation type as a fixed effect. To do this, we need to tell ‘lgcp’ how to find the vegetation type at any point in space, and we do this by creating model components with a fixed effect that we call vegetation (we could call it anything), as follows:

comp1 <- coordinates ~ vegetation(gcov$vegetation, model = "factor_full") - 1

Notes: * We need to tell ‘lgcp’ that this is a factor fixed effect, which we do with model="factor_full", giving one coefficient for each factor level. * We need to be careful about overparameterisation when using factors. Unlike regression models like ‘lm()’, ‘glm()’ or ‘gam()’, ‘lgcp()’, inlabru does not automatically remove the first level and absorb it into an intercept. Instead, we can either use model="factor_full" without an intercept, or model="factor_contrast", which does remove the first level.

comp1alt <- coordinates ~ vegetation(gcov$vegetation, model = "factor_contrast") + Intercept(1)

Fit the model as usual:

fit1 <- lgcp(comp1, nests, samplers = boundary, domain = list(coordinates = mesh))

Predict the intensity, and plot the median intensity surface. (In older versions, predicting takes some time because we did not have vegetation values outside the mesh so ‘inlabru’ needed to predict these first. Since v2.0.0, the vegetation has been pre-extended.)

The predidct function of inlabru takes into its data argument a SpatialPointsDataFrame, a SpatialPixelsDataFrame or a data.frame. We can use the inlabru function pixels to generate a SpatialPixelsDataFrame only within the boundary, using its mask argument, as shown below.

df <- pixels(mesh, mask = boundary)
int1 <- predict(fit1, df, ~ exp(vegetation))

ggplot() +
  gg(int1) +
  gg(boundary, alpha = 0, lwd = 2) +
  gg(nests, color = "DarkGreen") +
  coord_equal()

Not surprisingly, given that most nests are in Primary vegetation, the high density is in this vegetation. But there are substantial patches of predicted high density that have no nests, and some areas of predicted low density that have nests. What about the estimated abundance (there are really 647 nests there):

ips <- fm_int(mesh, boundary)
Lambda1 <- predict(fit1, ips, ~ sum(weight * exp(vegetation)))
Lambda1
#>       mean       sd   q0.025     q0.5   q0.975   median mean.mc_std_err
#> 1 645.8742 28.40562 585.8237 644.4606 694.0197 644.4606        2.840562
#>   sd.mc_std_err
#> 1      1.976184

A model with vegetation type and a SPDE type smoother

Lets try to explain the pattern in nest distribution that is not captured by the vegetation covariate, using an SPDE:

pcmatern <- inla.spde2.pcmatern(mesh,
  prior.sigma = c(0.1, 0.01),
  prior.range = c(1, 0.01)
)

comp2 <- coordinates ~
  -1 +
  vegetation(gcov$vegetation, model = "factor_full") +
  mySmooth(coordinates, model = pcmatern)
fit2 <- lgcp(comp2, nests, samplers = boundary, domain = list(coordinates = mesh))

And plot the median intensity surface

df <- pixels(mesh, mask = boundary)
int2 <- predict(fit2, df, ~ exp(mySmooth + vegetation), n.samples = 1000)

ggplot() +
  gg(int2, aes(fill = q0.025)) +
  gg(boundary, alpha = 0, lwd = 2) +
  gg(nests) +
  coord_equal()

… and the expected integrated intensity (mean of abundance)

Lambda2 <- predict(
  fit2,
  fm_int(mesh, boundary),
  ~ sum(weight * exp(mySmooth + vegetation))
)
Lambda2
#>       mean       sd   q0.025     q0.5   q0.975   median mean.mc_std_err
#> 1 676.7922 28.45341 627.7801 678.0634 735.7116 678.0634        2.845341
#>   sd.mc_std_err
#> 1      2.041402

Look at the contributions to the linear predictor from the SPDE and from vegetation:

lp2 <- predict(fit2, df, ~ list(
  smooth_veg = mySmooth + vegetation,
  smooth = mySmooth,
  veg = vegetation
))

The function scale_fill_gradientn sets the scale for the plot legend. Here we set it to span the range of the three linear predictor components being plotted (medians are plotted by default).

lprange <- range(lp2$smooth_veg$median, lp2$smooth$median, lp2$veg$median)
csc <- scale_fill_gradientn(colours = brewer.pal(9, "YlOrRd"), limits = lprange)

plot.lp2 <- ggplot() +
  gg(lp2$smooth_veg) +
  csc +
  theme(legend.position = "bottom") +
  gg(boundary, alpha = 0) +
  ggtitle("mySmooth + vegetation") +
  coord_equal()

plot.lp2.spde <- ggplot() +
  gg(lp2$smooth) +
  csc +
  theme(legend.position = "bottom") +
  gg(boundary, alpha = 0) +
  ggtitle("mySmooth") +
  coord_equal()

plot.lp2.veg <- ggplot() +
  gg(lp2$veg) +
  csc +
  theme(legend.position = "bottom") +
  gg(boundary, alpha = 0) +
  ggtitle("vegetation") +
  coord_equal()

multiplot(plot.lp2, plot.lp2.spde, plot.lp2.veg, cols = 3)

A model with SPDE only

Do we need vegetation at all? Fit a model with only an SPDE + Intercept, and choose between models on the basis of DIC, using ‘deltaIC()’.

comp3 <- coordinates ~ mySmooth(coordinates, model = pcmatern) + Intercept(1)
fit3 <- lgcp(comp3,
  data = nests,
  samplers = boundary,
  domain = list(coordinates = mesh)
)
int3 <- predict(fit3, df, ~ exp(mySmooth + Intercept))

ggplot() +
  gg(int3) +
  gg(boundary, alpha = 0) +
  gg(nests) +
  coord_equal()

Lambda3 <- predict(
  fit3,
  fm_int(mesh, boundary),
  ~ sum(weight * exp(mySmooth + Intercept))
)
Lambda3
#>       mean       sd   q0.025     q0.5   q0.975   median mean.mc_std_err
#> 1 676.2661 29.15892 620.4001 673.9146 728.7768 673.9146        2.915892
#>   sd.mc_std_err
#> 1      1.685045
knitr::kable(cbind(
  deltaIC(fit1, fit2, fit3, criterion = c("DIC")),
  deltaIC(fit1, fit2, fit3, criterion = c("WAIC"))
))
Model DIC Delta.DIC Model WAIC Delta.WAIC
fit1 -562.5418 0.000 fit1 1373.219 0.0000
fit3 518.7317 1081.273 fit3 1601.662 228.4428
fit2 612.0094 1174.551 fit2 1660.154 286.9353

NOTE: the behaviour of DIC and WAIC is currently a bit unclear, in particular for experimental mode, and is being investigated.

Classic mode:

Model DIC Delta.DIC Model WAIC Delta.WAIC
fit2 2224.131 0.00000 fit1 2672.234 0.0000
fit3 2274.306 50.17504 fit3 2949.498 277.2636
fit1 3124.784 900.65339 fit2 3040.140 367.9051

Experimental mode:

Model DIC Delta.DIC Model WAIC Delta.WAIC
fit1 -563.3583 0.000 fit1 1373.241 0.0000
fit3 509.4010 1072.759 fit3 1584.543 211.3021
fit2 597.6459 1161.004 fit2 1636.864 263.6225

CV and SPDE parameters for Model 2

We are going with Model fit2. Lets look at the spatial distribution of the coefficient of variation

ggplot() +
  gg(int2, aes(fill = sd / mean)) +
  gg(boundary, alpha = 0) +
  gg(nests) +
  coord_fixed()

Plot the vegetation “fixed effect” posteriors. First get their names - from $marginals.random$vegetation of the fitted object, which contains the fixed effect marginal distribution data

flist <- vector("list", NROW(fit2$summary.random$vegetation))
for (i in seq_along(flist)) flist[[i]] <- plot(fit2, "vegetation", index = i)
multiplot(plotlist = flist, cols = 3)

Use spde.posterior( ) to obtain and then plot the SPDE parameter posteriors and the Matern correlation and covariance functions for this model.

spde.range <- spde.posterior(fit2, "mySmooth", what = "range")
spde.logvar <- spde.posterior(fit2, "mySmooth", what = "log.variance")
range.plot <- plot(spde.range)
var.plot <- plot(spde.logvar)

multiplot(range.plot, var.plot)

corplot <- plot(spde.posterior(fit2, "mySmooth", what = "matern.correlation"))
covplot <- plot(spde.posterior(fit2, "mySmooth", what = "matern.covariance"))
multiplot(covplot, corplot)

Continuous covariates

Now lets try a model with elevation as a (continuous) explanatory variable. (First centre elevations for more stable fitting.)

elev <- gcov$elevation
elev$elevation <- elev$elevation - mean(elev$elevation, na.rm = TRUE)

ggplot() +
  gg(elev) +
  gg(boundary, alpha = 0) +
  coord_fixed()

The elevation variable here is of class ‘SpatialGridDataFrame’, that can be handled in the same way as the vegetation covariate. However, since in some cases data may be stored differently, and other methods are needed to access the stored values. In such cases, we can define a function that knows how to evaluate the covariate at arbitrary points in the survey region, and call that function in the component definition. In this case, we can use a powerful method from the ‘sp’ package to do this. We use this to create the needed function. NOTE: There is a new method eval_spatial() that handles this automatically, and supports terra SpatRaster and sf geometry points objects, and mismatching coordinate systems as well.

f.elev <- function(xy) {
  # turn coordinates into SpatialPoints object:
  # with the appropriate coordinate reference system (CRS)
  spp <- SpatialPoints(data.frame(x = xy[, 1], y = xy[, 2]),
    proj4string = fm_CRS(elev)
  )
  # Extract elevation values at spp coords, from our elev SpatialGridDataFrame
  v <- over(spp, elev)
  if (any(is.na(v$elevation))) {
    v$elevation <- bru_fill_missing(elev, spp, v$elevation)
  }
  return(v$elevation)
}

For brevity we are not going to consider models with elevation only, with elevation and a SPDE, and with SPDE only. We will just fit one with elevation and SPDE. We create our model to pass to lgcp thus:

matern <- inla.spde2.pcmatern(mesh,
  prior.sigma = c(0.1, 0.01),
  prior.range = c(1, 0.01)
)

ecomp <- coordinates ~ elev(f.elev(coordinates(.data.)), model = "linear") +
  mySmooth(coordinates, model = matern) + Intercept(1)

Note how the elevation effect is defined. When we used the Spatial grid object directly we specified it like

vegetation(gcov$vegetation, model = "factor_full")

whereas with the function method we specify the covariate like this:

elev(f.elev(coordinates(.data.)), model = "linear")

We also now include an intercept term.

The model is fitted in the usual way:

efit <- lgcp(ecomp, nests, samplers = boundary, domain = list(coordinates = mesh))

Summary and model selection

summary(efit)
#> inlabru version: 2.7.0.9010
#> INLA version: 23.05.30-1
#> Components:
#> elev: main = linear(f.elev(coordinates(.data.))), group = exchangeable(1L), replicate = iid(1L)
#> mySmooth: main = spde(coordinates), group = exchangeable(1L), replicate = iid(1L)
#> Intercept: main = linear(1), group = exchangeable(1L), replicate = iid(1L)
#> Likelihoods:
#>   Family: 'cp'
#>     Data class: 'SpatialPointsDataFrame'
#>     Predictor: coordinates ~ .
#> Time used:
#>     Pre = 0.663, Running = 8.31, Post = 0.482, Total = 9.45 
#> Fixed effects:
#>            mean    sd 0.025quant 0.5quant 0.975quant  mode kld
#> elev      0.004 0.001      0.002    0.004      0.006 0.004   0
#> Intercept 1.114 0.495      0.106    1.125      2.060 1.125   0
#> 
#> Random effects:
#>   Name     Model
#>     mySmooth SPDE2 model
#> 
#> Model hyperparameters:
#>                    mean    sd 0.025quant 0.5quant 0.975quant mode
#> Range for mySmooth 1.82 0.219      1.431     1.81       2.29 1.76
#> Stdev for mySmooth 1.01 0.085      0.858     1.01       1.19 1.00
#> 
#> Deviance Information Criterion (DIC) ...............: 514.01
#> Deviance Information Criterion (DIC, saturated) ....: NA
#> Effective number of parameters .....................: -830.89
#> 
#> Watanabe-Akaike information criterion (WAIC) ...: 1595.26
#> Effective number of parameters .................: 148.31
#> 
#> Marginal log-Likelihood:  -1255.07 
#>  is computed 
#> Posterior summaries for the linear predictor and the fitted values are computed
#> (Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')
deltaIC(fit1, fit2, fit3, efit)
#>   Model       DIC Delta.DIC
#> 1  fit1 -562.5418     0.000
#> 2  efit  514.0084  1076.550
#> 3  fit3  518.7317  1081.273
#> 4  fit2  612.0094  1174.551

Predict and plot the density

e.int <- predict(efit, pixels(mesh, mask = boundary), ~ exp(mySmooth + elev + Intercept))
e.int.log <- predict(efit, pixels(mesh, mask = boundary), ~ (mySmooth + elev + Intercept))

ggplot() +
  gg(e.int, aes(fill = log(sd))) +
  gg(boundary, alpha = 0) +
  gg(nests, shape = "+") +
  coord_equal()

ggplot() +
  gg(e.int.log, aes(fill = exp(mean + sd^2 / 2))) +
  gg(boundary, alpha = 0) +
  gg(nests, shape = "+") +
  coord_equal()

Now look at the elevation and SPDE effects in space. Leave out the Intercept because it swamps the spatial effects of elevation and the SPDE in the plots and we are interested in comparing the effects of elevation and the SPDE.

First we need to predict on the linear predictor scale.

e.lp <- predict(
  efit, pixels(mesh, mask = boundary),
  ~ list(
    smooth_elev = mySmooth + elev,
    elev = elev,
    smooth = mySmooth
  )
)

The code below, which is very similar to that used for the vegetation factor variable, produces the plots we want.

lprange <- range(e.lp$smooth_elev$mean, e.lp$elev$mean, e.lp$smooth$mean)

library(RColorBrewer)
csc <- scale_fill_gradientn(colours = brewer.pal(9, "YlOrRd"), limits = lprange)

plot.e.lp <- ggplot() +
  gg(e.lp$smooth_elev, mask = boundary) +
  csc +
  theme(legend.position = "bottom") +
  gg(boundary, alpha = 0) +
  ggtitle("SPDE + elevation") +
  coord_equal()

plot.e.lp.spde <- ggplot() +
  gg(e.lp$smooth, mask = boundary) +
  csc +
  theme(legend.position = "bottom") +
  gg(boundary, alpha = 0) +
  ggtitle("SPDE") +
  coord_equal()

plot.e.lp.elev <- ggplot() +
  gg(e.lp$elev, mask = boundary) +
  csc +
  theme(legend.position = "bottom") +
  gg(boundary, alpha = 0) +
  ggtitle("elevation") +
  coord_equal()

multiplot(plot.e.lp,
  plot.e.lp.spde,
  plot.e.lp.elev,
  cols = 3
)

You might also want to look at the posteriors of the fixed effects and of the SPDE. Adapt the code used for the vegetation factor to do this.

LambdaE <- predict(
  efit,
  ipoints(boundary, mesh),
  ~ sum(weight * exp(Intercept + elev + mySmooth))
)
LambdaE
#>       mean       sd   q0.025     q0.5   q0.975   median mean.mc_std_err
#> 1 671.6841 26.76323 629.0495 669.5203 715.7537 669.5203        2.676323
#>   sd.mc_std_err
#> 1      1.690119
flist <- vector("list", NROW(efit$summary.fixed))
for (i in seq_along(flist)) {
  flist[[i]] <- plot(efit, rownames(efit$summary.fixed)[i])
}
multiplot(plotlist = flist, cols = 2)

Plot the SPDE parameter posteriors and the Matern correlation and covariance functions for this model.

spde.range <- spde.posterior(efit, "mySmooth", what = "range")
spde.logvar <- spde.posterior(efit, "mySmooth", what = "log.variance")
range.plot <- plot(spde.range)
var.plot <- plot(spde.logvar)

multiplot(range.plot, var.plot)

corplot <- plot(spde.posterior(efit, "mySmooth", what = "matern.correlation"))
covplot <- plot(spde.posterior(efit, "mySmooth", what = "matern.covariance"))
multiplot(covplot, corplot)

Also estimate abundance. The data.frame in the second call leads to inclusion of N in the prediction object, for easier plotting.

Lambda <- predict(
  efit, fm_int(mesh, boundary),
  ~ sum(weight * exp(mySmooth + elev + Intercept))
)
Lambda
#>       mean      sd   q0.025     q0.5   q0.975   median mean.mc_std_err
#> 1 669.5712 26.7521 625.9226 667.2247 721.1337 667.2247         2.67521
#>   sd.mc_std_err
#> 1       2.04269

Nest.e <- predict(
  efit,
  fm_int(mesh, boundary),
  ~ data.frame(
    N = 200:1000,
    density = dpois(200:1000,
      lambda = sum(weight * exp(mySmooth + elev + Intercept))
    )
  ),
  n.samples = 2000
)

Plot in the same way as in previous practicals

Nest.e$plugin_estimate <- dpois(Nest.e$N, lambda = Lambda$median)
ggplot(data = Nest.e) +
  geom_line(aes(x = N, y = mean, colour = "Posterior")) +
  geom_line(aes(x = N, y = plugin_estimate, colour = "Plugin"))

Non-spatial evaluation of the covariate effect

The previous examples of posterior prediction focused on spatial prediction. From inlabru version 2.2.8, a feauture is available for overriding the component input value specification from the component definition. Each model component can be evaluated directly, for arbitrary values by functions named by adding the suffix _eval to the end of the component name in the predictor expression. Since the elevation effect in this model is linear, the resulting plot isn’t very interesting, but the same method can be applied to non-linear effects as well, and combined into general R expressions:

elev.pred <- predict(
  efit,
  data.frame(elevation = seq(0, 100, length.out = 1000)),
  formula = ~ elev_eval(elevation)
)
#> Warning in handle_problems(e_input): The input evaluation
#> 'f.elev(coordinates(.data.))' for 'elev' failed. Perhaps the data object
#> doesn't contain the needed variables? Falling back to '1'.

ggplot(elev.pred) +
  geom_line(aes(elevation, mean)) +
  geom_ribbon(
    aes(elevation,
      ymin = q0.025,
      ymax = q0.975
    ),
    alpha = 0.2
  ) +
  geom_ribbon(
    aes(elevation,
      ymin = mean - 1 * sd,
      ymax = mean + 1 * sd
    ),
    alpha = 0.2
  )

A 1D Example

Try fitting a 1-dimensional model to the point data in the inlabru dataset Poisson2_1D. This comes with a covariate function called cov2_1D. Try to reproduce the plot below (used in lectures) showing the effects of the Intercept + z and the SPDE. (You may find it helpful to build on the model you fitted in the previous practical, adding the covariate to the model specification.)

data(Poisson2_1D)
ss <- seq(0, 55, length = 200)
z <- cov2_1D(ss)
x <- seq(1, 55, length = 100)
mesh <- inla.mesh.1d(x, degree = 1)

comp <- x ~
  beta_z(cov2_1D(x), model = "linear") +
  spde1D(x, model = inla.spde2.matern(mesh)) +
  Intercept(1)

fitcov1D <- lgcp(comp, pts2, domain = list(x = mesh))
pr.df <- data.frame(x = x)
prcov1D <- predict(
  fitcov1D, pr.df,
  ~ list(
    total = exp(beta_z + spde1D + Intercept),
    fx = exp(beta_z + Intercept),
    spde = exp(spde1D)
  )
)

ggplot() +
  gg(prcov1D$total, color = "red") +
  geom_line(aes(x = prcov1D$spde$x, y = prcov1D$spde$median), col = "blue", lwd = 1.25) +
  geom_line(aes(x = prcov1D$fx$x, y = prcov1D$fx$median), col = "green", lwd = 1.25) +
  geom_point(data = pts2, aes(x = x), y = 0.2, shape = "|", cex = 4) +
  xlab(expression(bold(s))) +
  ylab(expression(hat(lambda)(bold(s)) ~ ~"and its components")) +
  annotate(geom = "text", x = 40, y = 6, label = "Intensity", color = "red") +
  annotate(geom = "text", x = 40, y = 5.5, label = "z-effect", color = "green") +
  annotate(geom = "text", x = 40, y = 5, label = "SPDE", color = "blue")