1 Introduction

This Supplement presents data, code, and detailed methodology of the analysis of counts of sharks collected during Diver-Operated Video (DOV) surveys conducted at Darwin and Wolf Islands in the Galápagos Marine Reserve. The data comprise counts from 64 sub-transects, each 50 m long and 5 m wide. Counts were taken using an instantaneous (insiders only) survey method and a non-instantaneous (total = insiders + outsiders) survey method.

In this Supplement (2), we present logistic regression models used to estimate the proportion of the total counts that were insiders, and thus the degree of bias associated with using a non-instantaneous survey method, as a function of diver swimming speed. The Supplement is organised as follows: in Section 2, we provide the data; in Section 3, we define the base model and provide the code used to fit it; in Section 4, we examine our prior distributions using prior predictive checks; in Section 5, we provide code used to obtain estimates and produced the figures used for inference in the main article; in Section 6, we compare our base model to alternative models; in Section 7, we examine model diagnostics; and, in Section 8, we compare inferences from our base model with one that has a prior that might give more conservative estimates of non-instantaneous bias.

Another Supplement (3) presents models used to estimate mean densities of sharks for each method at each island, for exposed and non-exposed transects.

2 The data

The following code inputs and tabulates (Table S2-1) the raw data.

dp <- structure(list(Insider = c(1, 4, 1, 3, 0, 0, 1, 2, 0, 2, 4, 2, 8, 4, 1, 5, 3, 6, 6, 1, 1, 5, 0, 24, 18, 1, 2, 1, 4, 7, 1, 6, 1, 1, 0, 0, 2, 1, 0, 3, 2, 1, 2, 12, 0, 1, 0, 3, 3, 0, 1, 6, 1, 1, 0, 0, 1, 3, 0, 4, 3, 3, 1, 0), Total = c(2, 5, 1, 12, 3, 1, 2, 9, 4, 2, 9, 2, 8, 8, 6, 6, 3, 10, 51, 5, 1, 6, 1, 27, 21, 1, 2, 3, 5, 18, 2, 8, 1, 2, 1, 1, 3, 1, 1, 3, 2, 1, 3, 13, 3, 3, 5, 3, 3, 1, 2, 6, 1, 1, 3, 1, 1, 3, 1, 4, 3, 3, 4, 1), Transect = c(3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 4, 4, 4, 4, 6, 9, 11, 11, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 12, 13, 13, 13, 13, 13, 13, 13, 13, 16), Speed = c(28.57142857, 43.47826087, 44.7761194, 43.47826087, 46.15384615, 48.38709677, 37.97468354, 40.54054054, 40.54054054, 44.7761194, 43.47826087, 47.61904762, 51.72413793, 47.61904762, 50, 46.875, 32.96703297, 39.47368421, 
29.12621359, 37.5, 38.96103896, 35.71428571, 32.96703297, 43.47826087, 37.03703704, 49.18032787, 44.11764706, 42.85714286, 38.96103896, 37.5, 40.54054054, 30.6122449, 42.25352113, 41.09589041, 22.05882353, 37.97468354, 36.14457831, 32.96703297, 38.96103896, 45.45454545, 50, 42.25352113, 37.03703704, 41.66666667, 41.09589041, 38.46153846, 33.70786517, 38.46153846, 46.15384615, 50.84745763, 48.38709677, 54.54545455, 49.18032787, 50, 40.54054054, 41.09589041, 42.25352113, 53.57142857, 49.18032787, 61.2244898, 65.2173913, 65.2173913, 58.82352941, 53.57142857), R = c(2, 1.25, 1, 4, Inf, Inf, 2, 4.5, Inf, 1, 2.25, 1, 1, 2, 6, 1.2, 1, 1.66666666666667, 8.5, 5, 1, 1.2, Inf, 1.125, 1.16666666666667, 1, 1, 3, 1.25, 2.57142857142857, 2, 1.33333333333333, 1, 2, Inf, Inf, 1.5, 1, Inf, 1, 1, 1, 1.5, 1.08333333333333, Inf, 3, Inf, 1, 1, Inf, 2, 1, 1, 1, Inf, Inf, 1, 1, Inf, 1, 1, 1, 4, Inf)), row.names = c(NA, -64L), class = c("tbl_df", "tbl", "data.frame"))

dp %>% 
  select(Transect, Speed, Insider, Total, 
         `Raw ratio` = R) %>%
  kable(caption = tables("tdat")) %>%
  kable_styling
Table S2-1: Raw data
Transect Speed Insider Total Raw ratio
3 28.57143 1 2 2.000000
3 43.47826 4 5 1.250000
3 44.77612 1 1 1.000000
3 43.47826 3 12 4.000000
3 46.15385 0 3 Inf
3 48.38710 0 1 Inf
3 37.97468 1 2 2.000000
3 40.54054 2 9 4.500000
3 40.54054 0 4 Inf
3 44.77612 2 2 1.000000
3 43.47826 4 9 2.250000
3 47.61905 2 2 1.000000
3 51.72414 8 8 1.000000
3 47.61905 4 8 2.000000
3 50.00000 1 6 6.000000
3 46.87500 5 6 1.200000
1 32.96703 3 3 1.000000
1 39.47368 6 10 1.666667
1 29.12621 6 51 8.500000
1 37.50000 1 5 5.000000
1 38.96104 1 1 1.000000
1 35.71429 5 6 1.200000
2 32.96703 0 1 Inf
2 43.47826 24 27 1.125000
2 37.03704 18 21 1.166667
2 49.18033 1 1 1.000000
2 44.11765 2 2 1.000000
2 42.85714 1 3 3.000000
2 38.96104 4 5 1.250000
2 37.50000 7 18 2.571429
2 40.54054 1 2 2.000000
2 30.61224 6 8 1.333333
2 42.25352 1 1 1.000000
2 41.09589 1 2 2.000000
4 22.05882 0 1 Inf
4 37.97468 0 1 Inf
4 36.14458 2 3 1.500000
4 32.96703 1 1 1.000000
4 38.96104 0 1 Inf
6 45.45455 3 3 1.000000
9 50.00000 2 2 1.000000
11 42.25352 1 1 1.000000
11 37.03704 2 3 1.500000
11 41.66667 12 13 1.083333
11 41.09589 0 3 Inf
11 38.46154 1 3 3.000000
11 33.70787 0 5 Inf
11 38.46154 3 3 1.000000
12 46.15385 3 3 1.000000
12 50.84746 0 1 Inf
12 48.38710 1 2 2.000000
12 54.54545 6 6 1.000000
12 49.18033 1 1 1.000000
12 50.00000 1 1 1.000000
12 40.54054 0 3 Inf
13 41.09589 0 1 Inf
13 42.25352 1 1 1.000000
13 53.57143 3 3 1.000000
13 49.18033 0 1 Inf
13 61.22449 4 4 1.000000
13 65.21739 3 3 1.000000
13 65.21739 3 3 1.000000
13 58.82353 1 4 4.000000
16 53.57143 0 1 Inf

This code modifies the data for input into the models:

dp2 <- dp %>% 
  mutate(X = Speed - 43) %>% # X is speed centred on zero
  select(-Speed) %>% # remove speed
  mutate(Tr = as.numeric(as.factor(dp$Transect))) %>% # transect index
  as.list() %>%
  list_modify(N = nrow(dp)) %>%
  list_modify(N_Tr = max(as.numeric(as.factor(dp$Transect)))) %>%
  list_modify(X_pred = 20:65 - 43) %>% # X values for predictions
  list_modify(N_pred = length(20:65) ) # length of predict x vector

Diver Swimming speeds ranged from 22 to 65 m min-1, with an average of 43 (Table S2-1). The raw proportion of insiders was 180/328 = 0.5487805.

3 Specifying and fitting the base model

3.1 Mathematical description

Here is a mathematical description of the base model used in the manuscript:

\[ \begin{aligned} y^*_i &\sim {\rm Binomial}(y_i, p_i) \\ {\rm logit}(p_i) &= \alpha + \beta ({\rm Speed}_i - 43) + z_{Tr[i]} \sigma_t\\ \alpha &\sim {\rm Normal}(0, 1.4)\\ \beta &\sim {\rm Normal}(0, 0.1)\\ z_{Tr[i]} &\sim {\rm Normal}(0, 1)\\ \sigma_t &\sim {\rm Exponential}(1)\\ f_i &= 1/p_i\\ \end{aligned} \] The parameters of Normal distributions here are means and standard deviations.

This is a logistic regression of the proportion of the total counts that were insiders as a function of diver swimming speed. The predictor variable \({\rm Speed}_i\), the diver swimming speed for transect \(i\), was centred by subtracting the mean speed of 43 m min-1. This way, the intercept parameter \(\alpha\) gives the proportion of insiders at the average swim speed (rather than at a speed of zero).

The variable \(Tr[i]\) is a vector of integers (between 1 and 10) indicating the transects to which each sub-transect \(i\) belongs. The transect effects here are modelled using a non-centred parameterisation, which is equivalent to the more usual parameterisation described in the manuscript, but it avoids problems with poor sampling efficiency that arises with the usual parameterisation when fitting the model with Hamiltonian Monte Carlo (McElreath 2015). With the non-centred approach, the sub-transect effects are given by \(t_j = z_j \sigma_t\), where \(z_j\) are the sub-transect effects as standardised z-scores and \(\sigma_t\) is the standard deviation of the sub-transect effects.

3.2 Stan code

The Stan code for this model is as follows (saved as object mod_propST):

data{
    int N; // sample size
    int N_Tr; // number of transects
    int Tr[N]; // vector of contiguous transect numbers
    int N_pred; // number of prediction speeds
    int Total[N]; // Total counts
    int Insider[N]; // Insider counts
    vector[N] X; // Speed - 43
    vector[N_pred] X_pred; // Values of (speed - 43) for prediction
}
parameters{
    real a;
    real b;
    vector[N_Tr] z_Tr;
    real<lower=0> sd_Tr;
}
model{
    vector[N] p;
    b ~ normal( 0 , 0.1 );
    a ~ normal( 0 , 1.4 );
    z_Tr ~ normal( 0 , 1 );
    sd_Tr ~ exponential( 1 );
    for ( i in 1:N ) {
        p[i] = inv_logit(a + b * X[i] + z_Tr[Tr[i]]*sd_Tr);
    }
    Insider ~ binomial( Total , p );
}
generated quantities{
    vector[N] log_lik;
    vector[N] p;
    vector[N] f;
    vector[N_pred] p_pred;
    vector[N_pred] f_pred;
    for ( i in 1:N ) {
        p[i] = inv_logit(a + b * X[i] + z_Tr[Tr[i]]*sd_Tr);
        f[i] = 1 / p[i];
    }
    for ( i in 1:N ) log_lik[i] = binomial_lpmf( Insider[i] | Total[i] , p[i] );
    for ( i in 1:N_pred ) { // Predict p and f across the range of X
        p_pred[i] = inv_logit(a + b * X_pred[i]);
        f_pred[i] = 1 / p_pred[i];
    }
}

3.3 Model fitting

Fit model for proportion of insiders \(p\) as a function of diver swimming speed:

fit_propST <- sampling(mod_propST,
           data=dp2,
           iter=2000,
           chains=4,
           cores=4,
           seed=111)  

4 Priors

An important step in Bayesian modelling is to scrutinise the choice of prior distributions by simulating models and data from the priors, so-called prior predictive checks (Gelman et al. 2014, Gabry et al. 2019). Our logistic regression model fits a linear model to the logit-transformed proportion of insiders. When using a link function like the logit, seemingly uninformative priors for logit-transformed parameters (e.g., Normal(0, 100)) can actually be very strong implied priors for the predictions on the original scale (Figure S2-1). In this section, we examine priors for the intercept (\(\alpha\) or a) and slope (\(\beta\) or b) parameters.

4.1 Priors for logit \(p\)

Here, we examine a range of priors for the intercept parameter \(\alpha\), which represents the logit-transformed proportion of insiders.

We took random samples from a few normal distributions, applied the inverse-logit function to get values on the proportion scale (between 0 and 1), and then examined their distributions using density plots.

n=1e6

ld <- bind_rows(tibble(Prior="N(0,1)", x=rnorm(n,0,1)),
            tibble(Prior="N(0,1.4)", x=rnorm(n,0,1.4)),
            tibble(Prior="N(0,1.5)", x=rnorm(n,0,1.5)),
            tibble(Prior="N(1,1.5)", x=rnorm(n,1,1.5)),
            tibble(Prior="N(0,2)", x=rnorm(n,0,2)),
            tibble(Prior="N(0,3)", x=rnorm(n,0,3)),
            tibble(Prior="N(0,1.7)", x=rnorm(n,0,1.7))) %>% 
  mutate(x = inv_logit(x)) 
p1 <- ld %>% ggplot() + aes(x=x, col=Prior) + 
    geom_density() + xlab("") + 
    labs(col="Prior on\nalpha = logit(p)") +
    ggtitle("Implied prior for\nproportion p") 

p2 <- ld %>% 
  mutate(x = 1/x) %>%
  ggplot() + aes(x=x, col=Prior) + 
    geom_density() + xlim(1,6) + ylab("") + xlab("") +
    theme(legend.position = "none") +
    ggtitle("Implied prior for\ninflation factor f = 1/p")

p1 + p2 + plot_layout(guides="collect") 
Figure S2-1: Implied priors for the proportion of insiders ($p$) and inflation factor ($f$) at swimming speed of 43 m min^-1^, for different normal-distribution priors on the parameter alpha fit on the logit scale

Figure S2-1: Implied priors for the proportion of insiders (\(p\)) and inflation factor (\(f\)) at swimming speed of 43 m min-1, for different normal-distribution priors on the parameter alpha fit on the logit scale

In Figure S2-1, we can see that assigning to logit \(p\) a normal prior with large standard deviation (e.g., > 2), the prior is increasingly weighted towards the extreme values of 0 and 1 on the probability scale. On the other hand, when the standard deviation is low as 0.5, extreme values of 0 and 1 are given little weight in favour of values around \(p\) = 0.5. The prior of N(0,1.4) for logit \(p\) gives a relatively flat prior on the probability scale for most of the range between 0 and 1, so this is what we used as the prior for the intercept parameter in our base model. We also show a more conservative prior, N(1,1.5), which puts greater weight on values of \(p\) closer to 1, thus giving a more sceptical view of the non-instantaneous bias. This conservative prior is used in a sensitivity analysis in Section 8.

4.2 Prior logistic regression lines

With Bayesian logistic regression, there is distribution of lines that arises from the priors for the intercept and slope parameters of the linear model fit of \(p\) vs the predictor variable (here, diver swimming speed) on the logit scale. The prior distribution of lines comprises lines that are considered “plausible” prior to seeing the data. When the model is fit to the data, we obtain a selection of lines from the prior that are consistent with the data. In this section, we examine the posterior distribution of lines arising from the priors in our base model, and compare it with that of two models with alternative priors: one with a prior on the intercept that favours lower non-instantaneous bias (\(\alpha \sim N(1,1.5)\)), and one with a broader prior on the slope parameter (\(\beta \sim N(0,0.1)\)).

The choice of priors in our base model achieve a wide range of relationships between \(p\) and swimming speed, including no relationship (flat lines), and a mixture of weak and relatively stong positive and negative relationships ( Figure S2-2A and B).

When examined the prior distribution of lines that arose from a ‘weaker’ prior on the slope parameter–specifically, where the standard deviation is 1 (\(N(0,1)\)) rather than 0.1 as in our base model. We show that the ‘weaker’ prior actually yielded stronger relationships between \(p\) and swimming speed (Figure S2-2C and D). Very few of the lines had no relationship or a weak relationship between bias and swimming speed, and the majority of lines showed a very strong or a very weak relationship. We did not use this prior because we wanted to allow for the model to return no relationship or a weak relationship if that is what is signalled by the data.

The model we fit with a more conservative prior on the intercept parameter (i.e., favouring a lesser degree of non-instantaneous bias overall) is shown in Figure S2-2E and F. These plots confirm that this prior does indeed represent a conservative model, with a greater number of lines near 1 and fewer near 0. Inferences from this model is compared with those of our base model as a sensitivity analysis in Section 8.

## Define function to plot logistic regression lines for different priors
plot_prior_logreg <- function(int_mean = 0, int_sd = 1.4,
                              slope_mean = 0, slope_sd = 0.1,
                              f = FALSE) {
  
  pred_X = list(X = 23:65 - 43) # define X
  draws <-  replicate(100, 
    inv_logit( rnorm(1, int_mean, int_sd) + 
    rnorm(1, slope_mean, slope_sd) * pred_X$X )) %>% t # get draws of lines
    
  if(f) { 
    draws <- 1/draws
    yla <- expression(paste("Inflation factor (", italic("f"),")")) 
    yli <- c(1, 20)
    } else {
    yla <- expression(paste("Proportion of insiders (", italic("p"),")"))
    yli <- c(0,1)
    }
  
  pred_X %>% add_draws( draws = draws , value = "f") %>% group_by(X, .draw) %>%
    ggplot(aes(x = X + 43, y = f, group = .draw)) +
      geom_line(alpha = .5) + coord_cartesian(ylim = yli) + ylab(yla) +
      xlab(expression(paste('Swimming speed (m min' ^ -1, ")", sep=""))) +
      theme_bw()
}

# Base priors
plot_prior_logreg() + 
  ggtitle("A. Proportion of insiders, base priors; intercept N(0,1.4), slope N(0,0.1)")
plot_prior_logreg(f = T) + 
  ggtitle("B. Inflation factor, base priors; intercept N(0,1.4), slope N(0,0.1)")

# Alternative prior for slope
plot_prior_logreg(slope_sd = 1) + 
  ggtitle("C. Proportion of insiders, intercept N(0,1.4), broader prior for slope N(0,1)")
plot_prior_logreg(slope_sd = 1, f = T) + 
  ggtitle("D. Inflation factor, intercept N(0,1.4), broader prior for slope N(0,1)")

# Alternative prior for intercept
plot_prior_logreg(int_mean = 1, int_sd = 1.5) + 
  ggtitle("E. Proportion of insiders, conservative intercept N(1,1.5), slope N(0,0.1)")
plot_prior_logreg(int_mean = 1, int_sd = 1.5, f = T) + 
  ggtitle("F. Inflation factor, conservative intercept N(1,1.5), slope N(0,0.1)")
Figure S2-2: Implied priors for the relationship between swimming speed and the proportion of insiders ($p$, left) and inflation factor ($f$, right), for different priors on the intercept alpha and slope betaFigure S2-2: Implied priors for the relationship between swimming speed and the proportion of insiders ($p$, left) and inflation factor ($f$, right), for different priors on the intercept alpha and slope betaFigure S2-2: Implied priors for the relationship between swimming speed and the proportion of insiders ($p$, left) and inflation factor ($f$, right), for different priors on the intercept alpha and slope betaFigure S2-2: Implied priors for the relationship between swimming speed and the proportion of insiders ($p$, left) and inflation factor ($f$, right), for different priors on the intercept alpha and slope betaFigure S2-2: Implied priors for the relationship between swimming speed and the proportion of insiders ($p$, left) and inflation factor ($f$, right), for different priors on the intercept alpha and slope betaFigure S2-2: Implied priors for the relationship between swimming speed and the proportion of insiders ($p$, left) and inflation factor ($f$, right), for different priors on the intercept alpha and slope beta

Figure S2-2: Implied priors for the relationship between swimming speed and the proportion of insiders (\(p\), left) and inflation factor (\(f\), right), for different priors on the intercept alpha and slope beta

5 Inference

In this section, we provide the code used to create Figure 4 in the main article (Figure S2-3 herein) and calculate estimates of \(\alpha\) and \(\beta\) (Table S2-2), and \(p\) and \(f\) for various swimming speeds (Table S2-3).

mypars <- c("a", "b", "sd_Tr")

summary(fit_propST)$summary %>%
  as.data.frame %>%
  rownames_to_column("Parameter") %>%
  dplyr::filter(Parameter %in% mypars) %>%
  kable(digits = 2, caption = tables("pars_tab")) %>%
  kable_styling()
Table S2-2: Summary of posterior distributions of key parameters
Parameter mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
a 0.47 0.01 0.35 -0.26 0.26 0.48 0.68 1.17 1290.60 1
b 0.13 0.00 0.03 0.08 0.11 0.13 0.14 0.18 2683.52 1
sd_Tr 0.84 0.01 0.38 0.35 0.58 0.76 1.01 1.78 1212.13 1
post_propST <- extract(fit_propST)

postf <- list(Speed = dp2$X_pred + 43) %>%
    add_draws(post_propST$f_pred, value = "f") %>%
    group_by(Speed) %>%
  mean_qi %>%
  dplyr::filter(Speed %in% c(22,30,40,43,50,60,65)) %>%
  select(2:4)

postp <- list(Speed = dp2$X_pred + 43) %>%
    add_draws(post_propST$p_pred, value = "p") %>%
    group_by(Speed) %>%
  mean_qi %>%
  dplyr::filter(Speed %in% c(22,30,40,43,50,60,65))%>%
  select(1:4)

bind_cols(postp,postf) %>%
  kable(digits = 3,
        caption = tables("tpf"), 
        col.names = c("Speed", "Mean", "Lower", "Upper", "Mean", "Lower", "Upper")) %>%
  add_header_above(c(" " = 1, "Proportion of insiders" = 3, "Inflation factor" = 3)) %>%
  kable_styling() 
Table S2-3: Predictions of proportion of total counts that are insiders (\(p\)) and inflation factor (\(f\)) for a range of swimming speeds. ‘Lower’ and ‘Upper’ are limits for 95% uncertainty intervals
Proportion of insiders
Inflation factor
Speed Mean Lower Upper Mean Lower Upper
22 0.114 0.028 0.260 12.365 3.851 35.956
30 0.244 0.098 0.416 4.713 2.401 10.157
40 0.522 0.339 0.685 1.980 1.461 2.951
43 0.612 0.436 0.763 1.667 1.311 2.293
50 0.788 0.644 0.898 1.278 1.113 1.552
60 0.924 0.827 0.978 1.084 1.023 1.210
65 0.955 0.881 0.991 1.048 1.009 1.134
plot_pf <- function(post, dat = dp2, mean_X = 43) {
  require(tidybayes)
  require(patchwork)
  mybreaks=c(1,5,10,20,30,50)
  plot_f_by_speed <- list(X = dat$X_pred) %>%
    add_draws(post$f_pred[1:200,], value = "f") %>%
    group_by(X, .draw) %>%
    ggplot(aes(x = X + mean_X, y = f, group = .draw)) +
    geom_hline(aes(yintercept=1), col = "dark grey") +
    geom_line(alpha = .4, col = "lightsteelblue") +
    stat_summary(aes(group = NULL), fun = mean, geom = 'line', alpha=0.9) +
    geom_point(data = data.frame(X = dat$X + mean_X, f = dat$R , Tr = dat$Transect),
               aes(x = X, y = f, col = factor(Tr), 
                   size=dat$Total), shape=1, 
               inherit.aes = F) +
    ylab(expression(paste("Inflation factor (", italic("f"),")"))) +
    xlab(expression(paste('Swimming speed (m min' ^ -1, ")", sep=""))) +
    coord_cartesian(ylim=c(1,10)) +
    scale_size_continuous("Total count", trans="identity",
                          range=c(1,10), breaks=mybreaks)  +
    theme_bw()+
    theme(legend.position = "none")
  
  plot_p_by_speed <- list(X = dat$X_pred) %>%
    add_draws(post$p_pred[1:200,], value = "p") %>%
    group_by(X, .draw) %>%
    ggplot(aes(x = X + mean_X, y = p, group = .draw)) +
    geom_hline(aes(yintercept=0), col = "dark grey") +
    geom_hline(aes(yintercept=1), col = "dark grey") +
    geom_line(alpha = .4, col = "lightsteelblue") +
    stat_summary(aes(group = NULL), fun = mean, geom = 'line', alpha=0.9) +
    geom_point(data = data.frame(X = dat$X + mean_X, 
                                 p = dat$Insider / dat$Total , Tr = dat$Transect),
               aes(x = X, y = p, col = factor(Tr), 
                   size=dat$Total), shape=1, 
               inherit.aes = F) +
    ylab(expression(paste("Proportion of insiders (", italic("p"),")"))) +
    xlab(expression(paste('Swimming speed (m min' ^ -1, ")", sep=""))) +
    scale_colour_discrete("Transect no.") +
    scale_size_continuous("Total count", trans="identity",
                          range=c(1,10), breaks=mybreaks) +
    theme_bw()
    
  plot_p_by_speed + plot_f_by_speed + 
    plot_layout( guides = "collect") +
    plot_layout(ncol=1) +
    plot_annotation(tag_levels = 'A') 
}

plot_pf(post_propST)
Figure S2-3: Figure 4 of the main article

Figure S2-3: Figure 4 of the main article

6 Comparison with alternative models

Our base model mod_propST included a term for the relationship of the proportion of insiders with swimming speed, and a term allowing for differences among transects. We fitted alternative models with each of these terms removed in turn, and with both terms removed (i.e., an intercept only model), and compared these with the base model using estimated log predictive density (elpd) scores estimated with Pareko-smoothed importance sampling (Vehtari et al. 2017). A lower elpd score indicates that a model will have lower error when making out-of-sample predictions.

6.1 Model with transect effects only (mod_propCT)

\[ \begin{aligned} y^*_i &\sim {\rm Binomial}(y_i, p_i) \\ {\rm logit}(p_i) &\sim \alpha + z_{Tr[i]} \sigma_t\\ \alpha &\sim {\rm Normal}(0, 1.4)\\ z_{Tr[i]} &\sim {\rm Normal}(0, 1)\\ \sigma_t &\sim {\rm Exponential}(1)\\\\ \end{aligned} \]

data{
    int N;
    int N_Tr;
    int Tr[N];
    int N_pred;
    int Total[N];
    int Insider[N];
}
parameters{
    real a;
    vector[N_Tr] z_Tr;
    real<lower=0> sd_Tr;
}
model{
    vector[N] p;
    a ~ normal( 0 , 1.4 );
    z_Tr ~ normal( 0 , 1 );
    sd_Tr ~ exponential( 1 );
    for ( i in 1:N ) p[i] = inv_logit(a + z_Tr[Tr[i]]*sd_Tr);
    Insider ~ binomial( Total , p );
}
generated quantities{
    vector[N] log_lik;
    vector[N] p;
    for ( i in 1:N ) p[i] = inv_logit(a + z_Tr[Tr[i]]*sd_Tr);
    for ( i in 1:N ) log_lik[i] = binomial_lpmf( Insider[i] | Total[i] , p[i] );

}
fit_propCT <- sampling(mod_propCT,
           data=dp2,
           iter=2000,
           chains=4,
           cores=4,
           seed=111)  

6.2 Model with swimming speed and no transect effects (mod_propS)

\[ \begin{aligned} y^*_i &\sim {\rm Binomial}(y_i, p_i) \\ {\rm logit}(p_i) &= \alpha + \beta ({\rm Speed}_i - 43)\\ \alpha &\sim {\rm Normal}(0, 1.4)\\ \beta &\sim {\rm Normal}(0, 0.1)\\ \end{aligned} \]


data{
    int N;
    int Total[N];
    int Insider[N];
    vector[N] X;
}
parameters{
    real a;
    real b;
}
model{
    vector[N] p;
    b ~ normal( 0 , 0.1 );
    a ~ normal( 0 , 1.4 );
    for ( i in 1:N ) p[i] = inv_logit(a + b * X[i]);
    Insider ~ binomial( Total , p );
}
generated quantities{
    vector[N] log_lik;
    vector[N] p;
    for ( i in 1:N ) p[i] = inv_logit(a + b * X[i]);
    for ( i in 1:N ) log_lik[i] = binomial_lpmf( Insider[i] | Total[i] , p[i] );
}
fit_propS <- sampling(mod_propS,
           data=dp2,
           iter=2000,
           chains=4,
           cores=4,
           seed=111) 

6.3 Model with constant only (mod_propC)

\[ \begin{aligned} y^*_i &\sim {\rm Binomial}(y_i, p) \\ {\rm logit}(p) &\sim {\rm Normal}(0, 1.4)\\ \end{aligned} \]

data{
    int N;
    int Total[N];
    int Insider[N];
}
parameters{
    real a;
}
model{
    real p;
    a ~ normal( 0 , 1.4 );
    p = inv_logit(a);
    Insider ~ binomial( Total , p );
}
generated quantities{
    vector[N] log_lik;
    real p;
    p = inv_logit(a);
    for ( i in 1:N ) log_lik[i] = binomial_lpmf( Insider[i] | Total[i] , p );
}
fit_propC <- sampling(mod_propC,
           data=dp2,
           iter=2000,
           chains=4,
           cores=4,
           seed=111)  

6.4 Model comparison

loo_propST <- fit_propST %>% loo 
loo_propS <- fit_propS %>% loo 
loo_propC <- fit_propC %>% loo 
loo_propCT <- fit_propCT %>% loo 
loo_compare(loo_propST, loo_propS, loo_propCT, loo_propC)
##        elpd_diff se_diff
## model1   0.0       0.0  
## model2 -10.2       9.9  
## model3 -20.6      11.0  
## model4 -28.1      23.5

We can see from the output above that the lowest elpd is achieved by the base model, so we use the base model for inferences.

7 Model diagnostics

The table and plots below show no issues with the model fitting: there were no divergent transitions; the effective sample sizes (n_eff) were all very high; \(\hat{R}\) scores were all near 1; and the chains appear to be well mixed.

mypars = c("a","b","p_pred[1]","p_pred[4]","p_pred[11]","p_pred[21]")

my_diag_plots <- function(fit, mypars) {
  pt <-  mcmc_trace(as.array(fit), pars = mypars)
  pn <- mcmc_neff(neff_ratio(fit, pars = mypars)) + yaxis_text(hjust = 1)
  pr <- mcmc_rhat(rhat(fit, pars = mypars)) + yaxis_text(hjust = 1)
  pt / (pn + pr) + plot_layout(heights=c(2,1)) + 
    plot_annotation(tag_levels = 'A') 
}

my_diag_plots(fit_propST, mypars=mypars)
Figure S2-4: Diagnostic plots showing, for key quantities, trace plots (A), effective sample sizes as a fraction of actual sample sizes (B), and R-hat scores (C)

Figure S2-4: Diagnostic plots showing, for key quantities, trace plots (A), effective sample sizes as a fraction of actual sample sizes (B), and R-hat scores (C)

8 Sensitivity analysis: conservative prior for intercept

For sensitivity analysis, we present the equivalent plots and results for the model with a more conservative prior for that favours higher proportions of insiders (and is thus more sceptical of non-instantaneous bias).

The difference is the prior for the intercept parameter. For the base model, the prior is \(\alpha \sim {\rm Normal}(0, 1.4)\). For the conservative model, the prior is \(\alpha \sim {\rm Normal}(1, 1.5)\).

8.1 Stan code for the conservative model

data{
    int N;
    int N_Tr;
    int Tr[N];
    int N_pred;
    int Total[N];
    int Insider[N];
    vector[N] X;
    vector[N_pred] X_pred;
}
parameters{
    real a;
    real b;
    vector[N_Tr] z_Tr;
    real<lower=0> sd_Tr;
}
model{
    vector[N] p;
    b ~ normal( 0 , 0.1 );
    a ~ normal( 1 , 1.5 );
    z_Tr ~ normal( 0 , 1 );
    sd_Tr ~ exponential( 1 );
    for ( i in 1:N ) {
        p[i] = inv_logit(a + b * X[i] + z_Tr[Tr[i]]*sd_Tr);
    }
    Insider ~ binomial( Total , p );
}
generated quantities{
    vector[N] log_lik;
    vector[N] p;
    vector[N] f;
    vector[N_pred] p_pred;
    vector[N_pred] f_pred;
    for ( i in 1:N ) {
        p[i] = inv_logit(a + b * X[i] + z_Tr[Tr[i]]*sd_Tr);
        f[i] = 1 / p[i];
    }
    for ( i in 1:N ) log_lik[i] = binomial_lpmf( Insider[i] | Total[i] , p[i] );
    for ( i in 1:N_pred ) { // Predict p and f across the range of X
        p_pred[i] = inv_logit(a + b * X_pred[i]);
        f_pred[i] = 1 / p_pred[i];
    }
}

8.2 Fit the conservative model

fit_propST_cons <- sampling(mod_propST_cons,
           data=dp2,
           iter=2000,
           chains=4,
           cores=4,
           seed=111)  

8.3 Results

Table S2-4 and Figure S2-5 are used to compare the results for the base and conservative models for predictions of the slope parameter \(\beta\) (b), and predicted proportion of insiders (p) and inflation factor (f) at various diver swimming speeds. The plots and posterior distributions for values of interest are very similar between the base and conservative models, so we can be confident that the base prior for the intercept is not driving our inferences.

post_propST_cons = extract(fit_propST_cons)

plot_pf(post_propST) + plot_annotation(title = "Base prior")
plot_pf(post_propST_cons) + plot_annotation(title = "Conservative prior")
Figure S2-5: Comparison of model results for base prior and conservative priorFigure S2-5: Comparison of model results for base prior and conservative prior

Figure S2-5: Comparison of model results for base prior and conservative prior

sum_bc <- function(b,c) {
  c(`Base model` = paste0(b %>% mean %>% round(.,2), " (",
            b %>% quantile(., 0.025) %>% round(.,2), ",",
            b %>% quantile(., 0.975) %>% round(.,2), ")") ,
    `Conservative model` = paste0(c %>% mean %>% round(.,2), " (",
                    c %>% quantile(., 0.025) %>% round(.,2), ",",
                    c %>% quantile(., 0.975) %>% round(.,2), ")") ) }

rbind(
  b = sum_bc(post_propST$b, post_propST_cons$b), 
  p_30 = sum_bc(post_propST$p_pred[,11],post_propST_cons$p_pred[,11]), 
  p_40 = sum_bc(post_propST$p_pred[,21],post_propST_cons$p_pred[,21]),
  p_50 = sum_bc(post_propST$p_pred[,31],post_propST_cons$p_pred[,31]),
  f_30 = sum_bc(post_propST$f_pred[,11],post_propST_cons$f_pred[,11]), 
  f_40 = sum_bc(post_propST$f_pred[,21],post_propST_cons$f_pred[,21]),
  f_50 = sum_bc(post_propST$f_pred[,31],post_propST_cons$f_pred[,31])
  ) %>% 
  kable(caption = tables("t_dat")) %>%
  kable_styling()
Table S2-4: Predictions for the slope parameter beta and estimates of \(p\) and \(f\) for speeds of 30, 40, and 50 m min-1, for the base model (with prior Normal(0,1.4) on the intercept alpha) vs a more conservative model (with prior Normal(1,1.5) on alpha). Values in parentheses are 95% uncertainty intervals
Base model Conservative model
b 0.13 (0.08,0.18) 0.13 (0.08,0.18)
p_30 0.24 (0.1,0.42) 0.25 (0.11,0.44)
p_40 0.52 (0.34,0.68) 0.54 (0.36,0.7)
p_50 0.79 (0.64,0.9) 0.8 (0.66,0.91)
f_30 4.71 (2.4,10.16) 4.47 (2.29,9.42)
f_40 1.98 (1.46,2.95) 1.91 (1.43,2.74)
f_50 1.28 (1.11,1.55) 1.26 (1.1,1.51)

References

Gabry J, Simpson D, Vehtari A, Betancourt M, Gelman A (2019) Visualization in Bayesian Workflow. Journal of the Royal Statistical Society: Series A (Statistics in Society) 182:389–402.

Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB (2014) Bayesian Data Analysis, 3rd ed. CRC press, Boca Raton.

McElreath R (2015) Statistical Rethinking: A Bayesian Course with Examples in R and Stan. CRC Press.

Vehtari A, Gelman A, Gabry J (2017) Practical Bayesian Model Evaluation Using Leave-One-Out Cross-Validation and WAIC. Statistics and Computing 27:1413–1432.