## Abstract

The science of complex systems is increasingly asked to forecast the consequences of climate change. As a result, scientists are now engaged in making predictions about an uncertain future, which entails the efficient communication of this uncertainty. Here we show the benefits of hierarchically decomposing the uncertainty in predicted changes in animal population size into its components due to structural uncertainty in climate scenarios (greenhouse gas emissions and global circulation models), structural uncertainty in the demographic model, climatic stochasticity, environmental stochasticity unexplained by climate–demographic trait relationships, and sampling variance in demographic parameter estimates. We quantify components of uncertainty surrounding the future abundance of a migratory bird, the greater snow goose (*Chen caeruslescens atlantica*), using a process-based demographic model covering their full annual cycle. Our model predicts a slow population increase but with a large prediction uncertainty. As expected from theoretical variance decomposition rules, the contribution of sampling variance to prediction uncertainty rapidly overcomes that of process variance and dominates. Among the sources of process variance, uncertainty in the climate scenarios contributed less than 3% of the total prediction variance over a 40-year period, much less than environmental stochasticity. Our study exemplifies opportunities to improve the forecasting of complex systems using long-term studies and the challenges inherent to predicting the future of stochastic systems.

## 1. Introduction

*Prediction is difficult*, *especially about the future* (attributed to Niels Bohr)

Climate warming is already impacting individuals, populations and food webs across most ecosystems of the Earth [1,2] and this is likely to accelerate in coming decades. To anticipate impacts, researchers often rely on models to generate plausible future trajectories. Such models can be classified into two broad categories: phenomenological and mechanistic models [3,4]. Phenomenological models are derived from statistical models (e.g. regressions) and are not based on a mechanistic understanding of the ecological processes driving the observed changes. An example of that is the forecasting of changes in species distributions based on anticipated changes in the location of the climate isotherms that delimit their current range [5,6]. Mechanistic models, by contrast, are based on biological processes describing pathways through which climate affects ecosystem function [3,7]. Physiological tolerance models are an example of mechanistic models for forecasting species distributions [8]. Models based on demographic processes are also used to forecast change in population size and distribution. An early example of that is a study with the emperor penguin (*Aptenodytes forsteri*) where the climate was treated as a two-state variable with demographic parameters specific to each of these two states [9]. More recent and sophisticated attempts to couple population dynamic models with climate warming scenarios are reviewed by Jenouvrier [10]. However, such studies are still scarce because they require long-term demographic data coupled with a detailed knowledge of the functional relationships between climatic factors and vital rates, and a good understanding of the underlying mechanisms [10–12].

In both approaches, any predicted trajectory derived from ecological models involves uncertainties, an important issue that has not always received full attention [13]. Failure to adequately measure or consider uncertainties associated with model-based predictions of future change in animal abundance can lead to misguided decisions by policymakers or managers [3,14,15]. Pragmatically, partitioning the sources of uncertainties can also be used to direct future research aimed at improving the precision of model predictions. However, the magnitude and relative contributions of various sources of uncertainty to the overall uncertainty of the prediction are rarely quantified in ecological models [13,16,17]. A rare example is provided by a study that assessed the relative contribution of uncertainties in future greenhouse gas emissions and in global circulation models (GCMs) to forecasted changes in the abundance of a bryophyte species, while also considering but not specifically analysing the uncertainty created by the demographic model [18].

In this paper, we develop a time-varying matrix population model for a migratory bird with temperature dependencies in demographic parameters coupled with climate warming scenarios. Our primary aim was to hierarchize the various sources of uncertainty contributing to the error in model predictions and assess their relative importance. This allowed us to project the expected change in population size of our study species over four decades under contrasted climate scenarios and their associated error. Here the terms ‘prediction’ and ‘forecast’ refer to population trajectories issued from our model and projected into the future; each separate prediction should be interpreted as a plausible trajectory among a large ensemble of such trajectories. The background for this work is the management of the growing greater snow goose population (*Chen caeculescens atlantica*), a population currently exposed to a strong warming trend [19,20], especially on its breeding ground due to the amplification of climate warming in the Arctic [21,22].

## 2. Material and methods

### (a) Study system

The greater snow goose is a migratory bird that breeds in the High Arctic, primarily in the eastern Canadian Arctic Archipelago, winters along the US Atlantic coast and stages twice a year in southern Canada along the St Lawrence River in the province of Quebec [19]. The population is hunted during the autumn and winter in southern Canada and the USA. A photographic census conducted every spring in Quebec has provided a relatively accurate estimate of population size each year [23]. The population size has increased from approximately 25 000 birds in 1965 to 1 000 000 birds in 1999, partly due to food-subsidy obtained from feeding in farmland in winter and spring [19]. Because of its high abundance and potential for inducing ecological imbalance on its Arctic breeding grounds, exceptional measures were implemented to stop the growth of this population, starting in 1999 (liberalized hunting regulations and introduction of a spring hunt, first in southern Canada and more recently in the USA [19,24]).

The demography of this population has been studied since 1990 at its largest known breeding colony in the Canadian Arctic, Bylot Island, Nunavut [25]. Reproductive success has been determined annually from a sample of several hundred nests and several thousand birds have been captured and marked every year in mass-banding drives when they are moulting. All birds were leg-banded and a sample of adult females was also marked with a neck collar [26] but in this study we did not use the information from neck-collared females. Several vital rates (survival, recruitment and breeding propensity) were estimated using the resulting capture–recapture data combining live recaptures in the summer and dead recoveries from hunters on the staging and wintering grounds in previous analyses [25,27].

### (b) Population model

We built a pre-breeding stage-based population model [28] with five pre-breeding stages (PB1–PB5 for ages 1–5; all individuals had recruited in the breeding population at 6 years [25]), plus one breeding stage (B) and one non-breeding stage (NB). We modelled only the female component, assuming an equal sex-ratio at birth. The transition from year *t* (columns) to year *t* + 1 (lines) was described by the following transition matrix:

where *F* denotes fecundity and was decomposed into the product of clutch size, proportion of nests where at least one egg survives to hatch, egg survival in nests where at least one egg survives to hatch, hatching probability of surviving eggs and gosling survival from hatching to near fledging; *b _{a}* denotes age-specific recruitment (probability of breeding for the first time at age

*a*conditional on survival); denotes the probability to skip a breeding season for a female that had already bred (individuals that skipped breeding were forced back into breeding the following year; see electronic supplementary material, Methods); and

*S*denotes age-specific yearly survival probability (notation

_{a}*a*+ represents all individuals of age

*a*or more). Except for age-specific recruitment, we did not consider improvement in reproductive success with age because it is unknown in this population (i.e. fecundity parameters are averaged over all age classes). Survival through the first year of life (

*S*

_{0}) was adjusted to a 10-month period in the matrix model () because survival was estimated from one late-summer to the next but the matrix model used a pre-breeding census formulation [23]. In this hunted species, survival was decomposed into survival to hunting (1 −

*h*, where

_{a}*h*= harvest rate) and survival to other sources of mortality (

_{a}*s*) with

_{a}*S*=

_{a}*s*(1 −

_{a}*h*) following Lebreton [29]. Survival to hunting was estimated as a linear function of measured harvest rate, which accommodates both partial compensation of hunting mortality [30] and an imperfect measure of harvest rate [25].

_{a}Parameter estimates came from a previous capture–recapture analysis [25] and included the following temperature dependencies:

— a positive effect of spring temperature in the Arctic breeding ground on clutch size;

— a positive effect of early summer temperature in the Arctic breeding ground on hatching probability;

— a negative effect of a combined index of spring temperature in the staging area of southern Quebec and in the Arctic breeding ground on the probability to skip breeding;

— a negative effect of a combined index of temperature during the non-breeding season in the staging area of southern Quebec and wintering area the eastern USA on recruitment probability; and

— a positive effect of late-summer temperature in the Arctic breeding ground on first-year natural survival.

Model parameters also included dependencies on harvest rate, as described above, and post-1999 special management measures aimed at increasing harvest rate [25] (electronic supplementary material, Methods). Harvest rate (*h _{a}*) was derived from annual hunter surveys and population size estimates [25]. The model was fitted using the 1990–2011 time series of estimated harvest rates (yearly range for adults: 4–13%, yearlings: 20–50%). In the projections, we used the average of the 1999–2011 values when a spring hunt was present (adults: 10%, yearlings: 34%) and of the 1990–1997 values in the absence of a spring hunt (adults: 7%, yearlings: 33%). Introduction of a spring hunt to control population size had negative effects on fecundity due to hunting disturbance and we accounted for that in our model (electronic supplementary material, Methods). In our projections, the effects of the spring hunt on hunting-related mortality and fecundity were removed each time the population fell below 400 000 females, the population objective, to mimic the management decision process for this population. We did not include further density-dependence in the model because population growth was still exponential when special management actions were introduced in 1999 [31]. Finally, survival probabilities estimated from the capture–recapture data were adjusted upwards because a previous analysis of this dataset found an underestimation of survival in the capture–recapture dataset [23] (electronic supplementary material, Methods).

### (c) Sources of uncertainty and simulation strategy

We decomposed the uncertainty surrounding the future abundance of a population in a hierarchical manner (table 1). In our case study, those uncertainty components were estimated in a stepwise, nested fashion, following the order presented in table 1. We started with a purely deterministic matrix population model [28] in Steps 1 and 2.

Step 3 represents the structural uncertainty on the demographic model (i.e. uncertainty regarding the nature of the demographic processes at play in a study population). This step could be ignored in our case study because previous work had brought unambiguous support for a single general population model describing the age structure and seasonality of survival and fecundity variation [25]. Climate–demographic relationships were modelled within the generalized linear mixed model framework [32], leaving little structural uncertainty regarding covariate effects on model parameters. We nonetheless explored the consequences of using an inadequate, biased demographic model on the uncertainty of model predictions in a formal analysis (electronic supplementary material, Theoretical exploration of prediction variance components); we come back to this issue in the discussion.

In Step 4, we addressed the structural uncertainty pertaining to both the choice of a GCM and to future anthropogenic greenhouse gas emissions (representative concentration pathways or RCP), collectively termed ‘uncertainty in the climate scenario’ for short hereafter. We used a set of 63 climate scenarios for this purpose (information on these scenarios is provided in electronic supplementary material, Climate scenarios).

In Step 5, we represented the inter-annual variability in weather realizations under given climate scenarios, termed ‘climate stochasticity’ for short hereafter. We simulated 40 weather realizations for each climate scenario, using autoregressive moving average models fitted to the projected climate time series (electronic supplementary material, Methods). We propagated that uncertainty into the abundance prediction via the matrix population model.

In Step 6, we addressed the environmental stochasticity in population abundance not explained by climate–demographic relationships. We estimated the residual process variance of climate–demographic parameter relationships when statistically different from zero following Lebreton *et al.* [32] and propagated it into prediction uncertainty. The result of that procedure was a model akin to a generalized linear mixed model: the sum of a linear effect representing the relationship between climate and the focal demographic parameter and of a random variable representing residual environmental variance. We used that model to generate 40 sets of parameter values.

Lastly, in Step 7, we addressed sampling variance in the estimation of demographic parameters because parameters of the climate–demographic relationships were affected by statistical uncertainty, as in any statistical regression. We resampled in the approximate multivariate normal distribution of parameter estimates with the sampling variance–covariance matrix extracted from the analysis of van Oudenhove *et al*. [25] with some constraints (electronic supplementary material, Methods). We generated 40 parameter values for each expected value obtained at Step 6, bringing the total number of simulations to 4 032 000.

We did not consider demographic stochasticity in this study (Step 8) because it is negligible compared to the other sources of variation when population size is very large [28]. The mean population trajectory was calculated across all the simulations. To help the interpretation of the results, we expressed the total prediction variance as a sum of the variance components described in table 1.

## 3. Results and discussion

For the observed period (1990–2011; 22 years), the model effectively tracked the observed changes in abundance and remained within the 95% confidence interval of the latter in all years (figure 1). In particular, it discriminated the two phases of the population's dynamics: exponential growth until 1999 (mean annual population growth rate, observed: 12%, predicted: 10%) and relative stability after the introduction of the special management actions in 1999 (mean annual population growth rate, observed: −1%, predicted: 1%). Observed, but not forecasted population size fluctuated annually after 1999 (figure 1), suggesting that some of the observed variation was due to sampling uncertainty in the population count data. Indeed, the sampling variance of the count is proportional to population size [23] making recent counts less reliable than those from the beginning of the study period.

For the forecasting period 2012–2050, our model based on climate warming scenarios predicted, on average, a slow population increase at a rate of 0.7% per year for the next 40 years, with a tendency for a slowing down of the rate of increase in the later period (figure 1). Despite the strong warming trend that geese should experience throughout their annual range in the coming decades (electronic supplementary material, Climate scenarios), the predicted increase in population size is relatively modest. This is likely because our previous analysis [25] showed that temperature has a positive effect on demographic parameters at some periods of the annual cycle but a negative effect at others. This important result emphasizes the need to consider climatic effects during the full annual cycle of a species when modelling future population changes [11,33]. It also suggests that previous analyses based on phenomenological models and suggesting rapid population expansion of some goose populations in response to a warming Arctic climate [34] may have been overly optimistic. Our analysis shows that the uncertainty surrounding future population abundance of greater snow geese accumulated very rapidly (figure 1). The distributions of projected population sizes in 2050 had a broad peak, which further emphasizes the large uncertainty associated with the predictions. As expected from the exponential structure of the model [35] (electronic supplementary material, Theoretical exploration of prediction variance components), the prediction interval was asymmetrical with more values above than below the mean projected abundance (figure 1). The asymmetry was also caused by the cessation of exceptional management actions to control this population each time the simulated population fell below 400 000 females, the population objective.

Sampling variation of demographic parameters was by far the greatest source of uncertainty in forecasted population size. After a period of adjustment over the first few years, its relative contribution to the total variance increased from a low of 57% to asymptotically approach 100% over time (figure 2), almost reaching it after a projected 90 years (not shown). Environmental stochasticity was the second most important source of uncertainty and peaked at 37% of the total prediction variance after 12 years before decreasing. In a formal analysis (electronic supplementary material, Theoretical exploration of prediction variance components), we show that the variance of the log-population size due to the sampling uncertainty in the demographic parameter estimates is proportional to the square of the prediction time horizon *t*, whereas the variance due to environmental variation in the broad sense is proportional to *t*. These theoretical results, albeit from a simplified model, explain why the sampling variance ultimately dominated other sources of variability (figure 2).

Among climate-associated sources of prediction variance, uncertainty due to climate stochasticity was relatively high initially (15%) but decreased rapidly afterwards (figures 2 and 3*a*). This is a common feature in climate scenarios [36] that was transferred into our demographic simulations. Among climate scenarios, uncertainty due to the GCM always contributed more to the total prediction variance than uncertainty in RCP, a result also found in the analysis of bryophyte populations [18] and likely to be shared by most predictive demographic models. Mean predicted population trajectories associated with the three RCP levels differed very little over a 40-year period (figure 1). Nonetheless, the relative contribution of uncertainty in RCP to the total prediction variance increased after 20 years compared to the GCM component (figure 3*b*), also a frequent feature in climate scenarios [36] that was transferred into our demographic simulations. Overall, uncertainty in the climate scenarios never contributed more than 3% of the total prediction variance (figure 2), putting a clear emphasis on our limited ability to predict the demography of wild species for more than a few years ahead based on climate projections alone.

Predictions are obviously conditional on the population model used to generate them and on the range of covariate values available to calibrate it. In our well-studied population [23,25–27], we were able to work with a single model that received strong support from the data. However, this may not always be the case in less intensively studied species, in which several models could represent the available data equally well. This would correspond to Step 3 of our decomposition of prediction variance (table 1). Structural uncertainty in the demographic model, also known as model uncertainty or uncertainty in model selection, can be addressed in several ways. First, it can be accounted for via model averaging, a procedure in which the relative support that each model receives is incorporated into the computation of confidence intervals around demographic parameter estimates [37]. This link between confidence intervals and structural uncertainty shows than one can expect structural (model) uncertainty to produce effects similar to sampling (parameter) uncertainty. A second, more explicit way of representing structural uncertainty is to consider biases in the parameter estimates, as done in our formal analysis (electronic supplementary material, Theoretical exploration of prediction variance components). This analysis shows that the variance associated with structural uncertainty, expressed as biases in the estimates, should indeed increase at the same rate as the sampling variance over time, and should be an additional source of concern when modelling the population dynamics of species with a poorly known demography.

Our insistence on the decomposition of the uncertainty into a series of components that are as comprehensive as possible is critical to direct future efforts to reduce this uncertainty. This is important because scientists dealing with complex systems [38,39] are increasingly asked to forecast the consequences of climate change [3,40,41]. We showed that sampling variance was the largest source of variation but it is also, fortunately, the one upon which researchers have potentially the most control. Reducing this uncertainty could be achieved by increasing sample sizes, increasing recapture rates in capture–recapture studies such as ours, periodically updating climate–demographic relationships as new data becomes available, and effectively combining multiple data sources [23].

Using relationships between demographic parameters and climatic covariates and feeding these relationships into a matrix population model as we did is akin to using transfer functions in time series analysis [42]. Two major differences, however, are that data used to estimate the parameters of the transfer function (i.e. the capture–recapture data) are independent of the time series itself (the population count) and in our study, we scrutinize uncertainty in both the predictors and the transfer function. Another problem common to forecasting efforts is the occurrence of latent relationships, left undetected or confounded with stochastic variation at the model fitting stage, which will only manifest themselves when the climate warms up or population grows beyond some threshold. Predictions are by nature based on some form of status quo assumption, whereas entirely new processes may emerge and completely change the nature of climate–demographic relationships [43]. In other words, the risk is mostly with things you don't know and that ‘you don't know you don't know’ [42]. With that in mind, adequately computing and reporting known sources of uncertainty in models coupling ecological processes and climate scenarios remains essential to avoid giving a false sentiment of confidence on projections issued from such modelling exercises [10].

## 4. Conclusion

Predicted changes in animal population size based on climate-dependent demographic models entail a large part of uncertainty, which is often not adequately accounted for or communicated by researchers. Here we empirically demonstrate and exhaustively quantify the contribution of the different sources of uncertainty in demographic projections, one of the main tools for scientists interested in the ecological consequences of climate change. Considerable emphasis is often placed on uncertainty linked to climate scenarios (i.e. greenhouse gas emissions and GCMs) in climate-dependent population models [18,44,45]. However, we show that these sources can be overwhelmed by environmental stochasticity. Sampling variance (i.e. measurement error) also rapidly accumulates in model predictions, even in a well-studied species, and can become the dominant source of uncertainty after a few decades. Efforts should be primarily devoted to reduce the latter source of uncertainty if one wishes to improve confidence in predicted change in animal populations exposed to climate warming based on demographic models. Finally, we also urge researchers to consider the full annual cycle of species when developing predictive population models, especially in long-distance migrants which can be exposed to very different climatic regimes at various times of the year [33].

## Data accessibility

Our paper is a modelling paper that builds upon climate–demography relationships that have been previously published in [23] and [25]. The only original data are the population size inventory conducted by the Canadian Wildlife Service (presented in electronic supplementary material, table S3). For constructing the climate scenarios, we used simulations from GCMs (listed in electronic supplementary material, table S2) as well as observation-based products (described in the electronic supplementary material) obtained from public institutions.

## Authors' contributions

G.G. and J.D.L. conceived the project; J.D.L. made the theoretical developments; G.P. conducted the data analysis; L.V.O. contributed analytical tools; P.G. generated the climate scenarios; G.G. and G.P. wrote the paper with inputs from J.D.L., P.G. and L.V.O.

## Competing interests

The authors declare no conflict of interest.

## Funding

This work was funded by grants from the Natural Science and Engineering Research Council of Canada, the Network of centres of excellence ArcticNet and the Canadian Wildlife Service to G.G.

## Acknowledgement

We thank R.F. Rockwell for his helpful comments on this manuscript.

## Footnotes

Electronic supplementary material is available online at https://dx.doi.org/10.6084/m9.figshare.c.3593750.

- Received October 27, 2016.
- Accepted November 22, 2016.

- © 2016 The Author(s)

Published by the Royal Society. All rights reserved.