This Supplement presents data, code, and detailed methodology of the analysis of counts of sharks collected during Diver-Operated Video (DOV) surveys conducted at Darwin and Wolf Islands in the Galápagos Marine Reserve. The data comprise counts of sharks from 17 transects of varying areas (0.25-0.56 ha), with counts taken using an instantaneous (insiders only) survey method and a non-instantaneous (total = insiders + outsiders) survey method.
Here, we present a model used to estimate mean densities of sharks for each method, at each island, for exposed and non-exposed transects.
Another Supplement (2) presents models used to estimate the non-instantaneous bias as a function of diver swimming speed.
The following code inputs and tabulates (Table S3-1) the raw data.
d <- structure(list(TransectNo = 1:17, Transect = c("Darwin's Arch north (1)",
"Darwin's Arch south (2)", "Old reef (3)", "Darwin anchorage (4)",
"Northern wall (5)", "Southern wall (6)", "Hidden reef (7)",
"North bay (8)", "Banana (9)", "Corals north (10)", "Corals south (11)",
"Landslide east (12)", "Landslide west (13)", "Elephant (14)",
"Northwestern wall (15)", "Pinnacle (16)", "Wolf anchorage (17)"),
Island = c("Darwin", "Darwin", "Darwin", "Darwin", "Darwin", "Darwin",
"Darwin", "Wolf", "Wolf", "Wolf", "Wolf", "Wolf", "Wolf", "Wolf", "Wolf",
"Wolf", "Wolf"), Exposure = c("Yes", "Yes", "Yes", "Yes", "No", "No",
"No", "No", "No", "No", "Yes", "Yes", "Yes", "No", "No", "Yes", "No"),
Length = c(697.6, 823.4, 1122.6, 550.8, 783.3, 941.5, 500.2, 680.6, 766.4,
689.4, 825.6, 849.7, 1018.7, 692.5, 986.7, 633.1, 809.9), Area = c(0.3488,
0.4117, 0.5613, 0.2754, 0.39165, 0.47075, 0.2501, 0.3403, 0.3832, 0.3447,
0.4128, 0.42485, 0.50935, 0.34625, 0.49335, 0.31655, 0.40495), Insider =
c(22, 66, 39, 3, 0, 3, 0, 0, 2, 0, 19, 12, 16, 0, 0, 0, 0), Total = c(102,
92, 85, 7, 0, 3, 0, 0, 2, 0, 32, 22, 22, 0, 0, 1, 0)),
row.names = c(NA, -17L), class = c("tbl_df", "tbl", "data.frame"))
d %>%
select(Transect, Island, Exposure, Length, Area, Insider, Total) %>%
kable(col.names = c("Transect", "Island", "Exposure", "Length (m)", "Area (ha)", "Insider", "Total"),
caption = tables("tdat", "Raw data")) %>%
kable_styling()
Transect | Island | Exposure | Length (m) | Area (ha) | Insider | Total |
---|---|---|---|---|---|---|
Darwin’s Arch north (1) | Darwin | Yes | 697.6 | 0.34880 | 22 | 102 |
Darwin’s Arch south (2) | Darwin | Yes | 823.4 | 0.41170 | 66 | 92 |
Old reef (3) | Darwin | Yes | 1122.6 | 0.56130 | 39 | 85 |
Darwin anchorage (4) | Darwin | Yes | 550.8 | 0.27540 | 3 | 7 |
Northern wall (5) | Darwin | No | 783.3 | 0.39165 | 0 | 0 |
Southern wall (6) | Darwin | No | 941.5 | 0.47075 | 3 | 3 |
Hidden reef (7) | Darwin | No | 500.2 | 0.25010 | 0 | 0 |
North bay (8) | Wolf | No | 680.6 | 0.34030 | 0 | 0 |
Banana (9) | Wolf | No | 766.4 | 0.38320 | 2 | 2 |
Corals north (10) | Wolf | No | 689.4 | 0.34470 | 0 | 0 |
Corals south (11) | Wolf | Yes | 825.6 | 0.41280 | 19 | 32 |
Landslide east (12) | Wolf | Yes | 849.7 | 0.42485 | 12 | 22 |
Landslide west (13) | Wolf | Yes | 1018.7 | 0.50935 | 16 | 22 |
Elephant (14) | Wolf | No | 692.5 | 0.34625 | 0 | 0 |
Northwestern wall (15) | Wolf | No | 986.7 | 0.49335 | 0 | 0 |
Pinnacle (16) | Wolf | Yes | 633.1 | 0.31655 | 0 | 1 |
Wolf anchorage (17) | Wolf | No | 809.9 | 0.40495 | 0 | 0 |
The area surveyed was 2.7097 ha at Darwin Island, of which 59% was classified as exposed, and 3.9763 ha at Wolf Island, of which 42% was exposed.
We modelled the transect counts \(y_i\) using a negative binomial (NB) distribution as follows:
\[ \begin{aligned} y_i &\sim {\rm NB}(\mu_i, \phi) \\ \log(\mu_{i}) &= \beta_0 + \beta_W W_i + \beta_E E_i + \log(h_{i}) \\ \beta_0 &\sim {\rm Normal}(2, 2)\\ \beta_W &\sim {\rm Normal}(0, 2)\\ \beta_E &\sim {\rm Normal}(0, 2)\\ \theta &= {\phi}^{-1/2}\\ \theta &\sim {\rm Exp(1)} \end{aligned} \] This model was applied separately to the counts of insiders and the total counts. The expected count for transect \(i\) (\(\mu_i\)) was modelled on the log scale, with an overall mean \(\beta_0\), plus the effects of island and exposure, and an offset term to account for different transect sizes.
For the effects of island and exposure, we used sum-to-zero contrasts so that the variables \(W_i = -1\) for Darwin Island and \(W_i = 1\) for Wolf Island; and \(E_i = -1\) for not exposed transects and \(E_i = 1\) for exposed transects. Thus, the parameter \(\beta_W\) is added to \(beta_0\) for transects at Wolf Island, and subtracted from \(\beta_0\) for transects at Darwin Island. Likewise, the parameter \(\beta_E\) is added for exposed transects and subtracted for non-exposed transects. No interaction term was included.
The areas in hectares (\(h_i\)) was used as an offset term to account for the fact that larger transects with the same conditions are expected to have proportionately larger counts. This enabled us to estimate mean densities per ha using the combinations of the three \(\beta\) variables, as demonstrated below.
\[ \begin{aligned} \log(\mu) &= \eta + \log(h) \\ \eta &= \log(\mu / h)\\ e^\eta &= \mu / h \\ \end{aligned} \] Thus, the posterior distributions of the mean densities of sharks per ha for the four combinations of island and exposure could be obtained from:
The following Stan code (assigned to object ‘mod1
’) was used to fit the model above.
data{
int N;
int Y[N];
int W[N];
int E[N];
real off[N];
}
parameters{
real beta;
real beta_W;
real beta_E;
real<lower=0> theta;
}
model{
vector[N] mu;
real phi;
theta ~ exponential( 1 );
phi = theta^-2;
beta ~ normal(2,2) ;
beta_W ~ normal(0,2) ;
beta_E ~ normal(0,2) ;
for ( i in 1:N ) {
mu[i] = beta + beta_W*W[i] + beta_E*E[i] + log(off[i]);
mu[i] = exp(mu[i]);
}
Y ~ neg_binomial_2( mu , phi );
}
generated quantities{
vector[N] log_lik;
vector[N] mu;
vector[4] alpha;
real phi;
phi = theta^-2;
for ( i in 1:N ) {
mu[i] = beta + beta_W*W[i] + beta_E*E[i] + log(off[i]);
mu[i] = exp(mu[i]);
}
for ( i in 1:N ) log_lik[i] = neg_binomial_2_lpmf( Y[i] | mu[i] , phi );
alpha[1] = beta - beta_W - beta_E;
alpha[2] = beta - beta_W + beta_E ;
alpha[3] = beta + beta_W - beta_E;
alpha[4] = beta + beta_W + beta_E;
}
Fit the model to insider counts:
fit_insider_mod1 <- d %>%
mutate(W = recode(`Island`, "Wolf" = 1, "Darwin" = -1),
E = recode(`Exposure`, "Yes" = 1, "No" = -1)) %>%
dplyr::select(Y = Insider, W, E, off = Area) %>%
as.list() %>%
list_modify(N=nrow(d)) %>%
sampling(mod1,
data=.,
iter=2000,
chains=4,
cores=4,
seed=111)
Fit the model to total counts:
fit_total_mod1 <- d %>%
mutate(W = recode(`Island`, "Wolf" = 1, "Darwin" = -1),
E = recode(`Exposure`, "Yes" = 1, "No" = -1)) %>%
dplyr::select(Y = Total, W, E, off = Area) %>%
as.list() %>%
list_modify(N=nrow(d)) %>%
sampling(mod1,
data=.,
iter=2000,
chains=4,
cores=4,
seed=111)
We plotted some probability density functions for the log-normal distribution for various means and standard deviations (given on the log scale). These probability densities were used to guide the prior for the parameter \(beta_0\), the overall mean density on the log scale. Previous work had indicated that densities of sharks are very low in non-exposed sites, and very high in exposed sites (Salinas-de-León et al. 2016), and we required a prior that covered all the plausible values for the overall mean density. The bottom row in Figure S3-1, where the standard deviation is 10, demonstrates that larger standard deviations on the log scale can actually be quite restrictive, placing greater density on very small values and extremely large values. Thus, a more moderate prior of \(\beta_0 ~ N(2,2)\) was chosen (Figure S3-1).
pdln <- expand_grid(x = c( seq(0, 100, length=500) ),
mean = seq(0,4,by=.5),
sd = c(seq(0.5,4,by=.5),10) ) %>%
mutate(y = dlnorm(x,mean,sd)) %>%
ggplot() + aes(x=x,y=y) + geom_area() + facet_grid(sd ~ mean) +
coord_cartesian(ylim = c(0,0.05)) + ylab("")
pdln
We plotted the prior mean densities for each coast at each island (Figure S3-2) to check that, in combination, the priors we chose for \(\beta_0\), \(\beta_W\), and \(\beta_E\) covered the appropriate range of plausible values. We also checked whether the model as parameterised had the flexibility to produce different means across the four groups, by adding 500 transparent lines. Each of these lines comes from a single draw from the joint prior distributions of \(\beta_0\), \(\beta_W\), and \(\beta_E\).
n = 500
b0 <- rnorm(n,2,2) # random draws for log overall mean
bW <- rnorm(n,0,2) # random draws for effect of Island
bE <- rnorm(n,0,2) # random draws for effect of Exposure
prior_means <- bind_cols(
Island = rep(c("Darwin","Wolf"), each=2),
Exposure = rep(c("Not exposed","Exposed") ,times=2),
x = 1:4) %>%
slice(rep(1:n(), each = n)) %>%
add_column( y = c(exp(b0 - bE - bW), exp(b0 + bE - bW),
exp(b0 - bE + bW), exp(b0 + bE + bW)) ) %>%
add_column( .draw = rep(1:n, times=4) )
prior_means %>%
ggplot() +
aes(y = y, x = x, group = .draw) + geom_line(alpha=.1) +
xlab("") + ylab("Priors for mean densities of sharks per ha") +
geom_line(data=dplyr::filter(prior_means, .draw %in% sample(1:n,10)),
aes(col=factor(.draw))) +
annotation_logticks(sides = "l", colour = "grey") +
scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x))) +
theme(legend.position = "none") +
scale_x_discrete(limits=1:4, labels=c("Darwin, not exposed","Darwin, exposed","Wolf, not exposed","Wolf, exposed"))+
stat_pointinterval(aes(y = y, x = x),
inherit.aes = F, point_interval = median_qi,
.width = c(.90, .50))
The NB distribution was used to allow for overdispersion in the counts due to the tendency of sharks to aggregate. A shrinkage exponential prior was placed on \(\theta\), the inverse square-root of the inverse-dispersion parameter \(\phi\). As \(\theta\) approaches zero, the NB converges to the Poisson distribution. A weakly informative penalised-complexity prior (Simpson et al. 2017) on \(\theta\), such as the exponential used here, allows for overdispersion if it is signalled in the data but encourages a degree of conservativism.
We checked the following details to ensure the sampling algorithm used to fit the models worked well (Gelman et al. 2014). There were no divergent transitions during the Hamiltonian Monte Carlo (HMC) algorithm used to fit the models. The plots below (Figure S3-3 and Figure S3-4) show for five key parameters that the HMC chains were well-mixed (panel A), the effective posterior sample sizes were all large in absolute number and relative to actual sample sizes (B), and \(\hat{R}\) values for key parameters were all below 1.05 (C).
mypars = c("alpha[1]","alpha[2]","alpha[3]","alpha[4]","theta")
my_diag_plots <- function(fit, mypars) {
pt <- mcmc_trace(as.array(fit), pars = mypars)
pn <- mcmc_neff(neff_ratio(fit, pars = mypars)) + yaxis_text(hjust = 1)
pr <- mcmc_rhat(rhat(fit, pars = mypars)) + yaxis_text(hjust = 1)
pt / (pn + pr) + plot_layout(heights=c(2,1)) +
plot_annotation(tag_levels = 'A')
}
my_diag_plots(fit_insider_mod1, mypars=mypars)
my_diag_plots(fit_total_mod1, mypars=mypars)
Posterior predictive checks (PPC) are used to check whether the values of summary statistics calculated from the data are consistent with the predictions of the model (Gelman et al. 2014). To do this, we calculated various statistics from the observed dataset (\(T(y)\)) and plotted against the distributions of those statistics calculated from replicate datasets generated by the fitted model (\(T(y_{rep})\)). Below, we present plots of these analysis, showing a histogram of test statistics from the model predictions (\(T(y_{rep})\)) and a single bar for the observed value from the data (\(T(y)\)). In all cases, the observed statistics are consistent with those from the fitted models, for both insiders and total counts, and for all the data and for each island.
# create plotting function
plot_ppc_mod1 <- function(fit) {
post <- extract(fit)
ns = 1000
yrep <- matrix(, nrow=ns, ncol=17)
for(i in 1:ns) {
for(j in 1:17) {
yrep[i,j] <- rnbinom(1, mu=post$mu[i,j], size=post$phi[i])
} }
y <- c(d$Insider, d$Insider)
yrep <- cbind(yrep,yrep)
grp <- c(rep("All", length(d$Insider)), d$Island )
prop_zero <- function(x) sum(x>0) / length(x)
p1 = ppc_stat_grouped(y, yrep, stat="mean", group = grp, facet_args = list(nrow = 1)) + xlim(0,200)
p2 = ppc_stat_grouped(y, yrep, stat="sd", group = grp, facet_args = list(nrow = 1)) + xlim(0,300)
p3 = ppc_stat_grouped(y, yrep, stat="max", group = grp, facet_args = list(nrow = 1)) + xlim(0,1000)
p4 = ppc_stat_grouped(y, yrep, stat="prop_zero", group = grp, facet_args = list(nrow = 1))
p1 + p2 + p3 + p4 + plot_layout(ncol=1, heights = unit(3, 'cm'))
}
# create PPC plot for insiders
plot_ppc_mod1(fit_insider_mod1) + plot_annotation(
title = 'Posterior predictive checks for insider counts')
# create PPC plot for total
plot_ppc_mod1(fit_total_mod1) + plot_annotation(
title = 'Posterior predictive checks for total counts')
A brief summary of the parameter values is given in Table S3-2.
mypars <- c("beta", "beta_W", "beta_E", "phi", "theta")
tab_i <- summary(fit_insider_mod1)$summary %>%
as.data.frame %>%
rownames_to_column("Parameter") %>%
dplyr::filter(Parameter %in% mypars)
tab_t <- summary(fit_total_mod1)$summary %>%
as.data.frame %>%
rownames_to_column("Parameter") %>%
dplyr::filter(Parameter %in% mypars)
bind_rows( tab_i , tab_t ) %>%
kable(digits = 2, caption = tables("pars_tab")) %>%
kableExtra::pack_rows("Instantaneous", 1, length(mypars)) %>%
kableExtra::pack_rows("Non-instantaneous", 1+length(mypars),2*length(mypars)) %>%
kable_styling()
Parameter | mean | se_mean | sd | 2.5% | 25% | 50% | 75% | 97.5% | n_eff | Rhat |
---|---|---|---|---|---|---|---|---|---|---|
Instantaneous | ||||||||||
beta | 2.16 | 0.01 | 0.39 | 1.45 | 1.91 | 2.15 | 2.41 | 2.96 | 2779.90 | 1 |
beta_W | -0.52 | 0.01 | 0.37 | -1.24 | -0.76 | -0.53 | -0.29 | 0.22 | 3152.19 | 1 |
beta_E | 1.74 | 0.01 | 0.39 | 0.98 | 1.50 | 1.74 | 2.00 | 2.51 | 3014.39 | 1 |
theta | 1.13 | 0.01 | 0.36 | 0.59 | 0.88 | 1.08 | 1.32 | 1.99 | 2042.89 | 1 |
phi | 1.05 | 0.02 | 0.71 | 0.25 | 0.57 | 0.87 | 1.30 | 2.83 | 1880.52 | 1 |
Non-instantaneous | ||||||||||
beta | 2.49 | 0.01 | 0.37 | 1.79 | 2.25 | 2.48 | 2.71 | 3.24 | 2323.70 | 1 |
beta_W | -0.65 | 0.01 | 0.33 | -1.30 | -0.84 | -0.65 | -0.44 | 0.00 | 2553.28 | 1 |
beta_E | 2.06 | 0.01 | 0.36 | 1.37 | 1.83 | 2.06 | 2.29 | 2.77 | 3258.24 | 1 |
theta | 1.00 | 0.01 | 0.30 | 0.57 | 0.79 | 0.95 | 1.16 | 1.72 | 2210.27 | 1 |
phi | 1.27 | 0.01 | 0.73 | 0.34 | 0.74 | 1.11 | 1.60 | 3.10 | 2558.82 | 1 |
The following code extracts the posterior distributions of mean densities per ha, and summarises them in a table (Table S3-3) and a plot (Figure S3-7).
# extract posterior distributions for parameters
mkpost <- function(fit_ins, fit_tot) {
Exposure <- c("Not exposed","Exposed","Not exposed","Exposed")
Island <- c("Darwin Island","Darwin Island","Wolf Island","Wolf Island")
bind_rows(
spread_draws(fit_ins, alpha[IE]) %>%
add_column(Method = "Instantaneous"),
spread_draws(fit_tot, alpha[IE]) %>%
add_column(Method = "Non-instantaneous") ) %>%
ungroup() %>%
mutate(Method,
Exposure = Exposure[IE],
Island = Island[IE],
Density = exp(alpha)) %>%
group_by(Method, Island, Exposure) %>%
select(-IE, -alpha)
}
rawdens <- d %>%
select(Island,Exposure,Insider,Total,Area) %>%
pivot_longer(c(Insider, Total), names_to="Method", values_to="Count") %>%
group_by(Method,Island,Exposure) %>%
summarise(`Raw density`=round(sum(Count)/sum(Area),2)) %>%
pull(`Raw density`)
post <- mkpost(fit_insider_mod1,fit_total_mod1)
posttab <- post %>% median_qi(.width = .95)
posttab %>%
select(Method, Island, Exposure,
`Median` = Density,
`Lower CI` = .lower,
`Upper CI` = .upper) %>%
add_column(`Raw mean densities` = rawdens[c(2,1,4,3,6,5,8,7)]) %>%
kable(digits = 2, caption = tables("means_tab_mod1")) %>%
add_header_above(c(" " = 3, "Modelled mean densities of sharks" = 3, " " = 1)) %>%
kable_styling()
Method | Island | Exposure | Median | Lower CI | Upper CI | Raw mean densities |
---|---|---|---|---|---|---|
Instantaneous | Darwin Island | Exposed | 80.66 | 29.85 | 288.45 | 81.39 |
Instantaneous | Darwin Island | Not exposed | 2.52 | 0.61 | 11.47 | 2.70 |
Instantaneous | Wolf Island | Exposed | 28.21 | 10.56 | 109.47 | 28.25 |
Instantaneous | Wolf Island | Not exposed | 0.89 | 0.21 | 4.04 | 0.86 |
Non-instantaneous | Darwin Island | Exposed | 172.36 | 75.14 | 561.93 | 179.06 |
Non-instantaneous | Darwin Island | Not exposed | 2.87 | 0.79 | 11.59 | 2.70 |
Non-instantaneous | Wolf Island | Exposed | 47.11 | 20.18 | 173.34 | 46.29 |
Non-instantaneous | Wolf Island | Not exposed | 0.82 | 0.21 | 3.20 | 0.86 |
ppost <- function(post) {
post %>%
ggplot() +
aes(y = Density, x = Island, col = Method) +
facet_wrap(.~Exposure, scales="free") +
xlab("") + ylab("Mean density of sharks per ha") +
stat_pointinterval(point_interval = median_qi,
.width = c(.90, .50),
position = position_dodge(0.95)) +
stat_summary(geom = "point", fun = "mean", pch = "-", size = 7,
position = position_dodge(0.95)) +
scale_colour_manual(values=c("steelblue", "#E69F00"))
}
ppost(post)
For the total counts (i.e., the non-instantaneous method, including both insiders and outsiders) in transects exposed to the south-east currents, we estimated the mean densities of sharks (individuals ha-1) to be 172.4 (95% uncertainty interval (CI) = 75.1, 561.9) at Darwin Island, and 47.1 (20.2, 173.3) at Wolf Island (Figure 4). Using the insider-only counts (i.e., the instantaneous method), the mean densities of sharks at exposed sites was estimated to be 80.7 (29.9, 288.4) at Darwin Island and 28.2 (10.6, 109.5) at Wolf Island. For total counts at non-exposed transects, the estimated mean densities were 2.9 (0.8, 11.6) at Darwin Island and 0.8 (0.2, 3.2) at Wolf Island. All 5 sharks observed in non-exposed transects were insiders, so estimated densities were similar for the insider-only counts.
Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB (2014) Bayesian Data Analysis, 3rd ed. CRC press, Boca Raton.
Salinas-de-León P, Acuña-Marrero D, Rastoin E, Friedlander AM, Donovan MK, Sala E (2016) Largest Global Shark Biomass Found in the Northern Galápagos Islands of Darwin and Wolf. PeerJ 4:e1911.
Simpson DP, Rue H, Riebler A, Martins TG, Sørbye SH (2017) Penalising Model Component Complexity: A Principled, Practical Approach to Constructing Priors. Statistical Science 32:1–28.