## Abstract

Model uncertainty and limited data are fundamental challenges to robust management of human intervention in a natural system. These challenges are acutely highlighted by concerns that many ecological systems may contain tipping points, such as Allee population sizes. Before a collapse, we do not know where the tipping points lie, if they exist at all. Hence, we know neither a complete model of the system dynamics nor do we have access to data in some large region of state space where such a tipping point might exist. We illustrate how a Bayesian non-parametric approach using a Gaussian process (GP) prior provides a flexible representation of this inherent uncertainty. We embed GPs in a stochastic dynamic programming framework in order to make robust management predictions with both model uncertainty and limited data. We use simulations to evaluate this approach as compared with the standard approach of using model selection to choose from a set of candidate models. We find that model selection erroneously favours models without tipping points, leading to harvest policies that guarantee extinction. The Gaussian process dynamic programming (GPDP) performs nearly as well as the true model and significantly outperforms standard approaches. We illustrate this using examples of simulated single-species dynamics, where the standard model selection approach should be most effective and find that it still fails to account for uncertainty appropriately and leads to population crashes, while management based on the GPDP does not, as it does not underestimate the uncertainty outside of the observed data.

## 1. Introduction

Decision-making under uncertainty is a ubiquitous challenge in the management of human intervention in natural resources and conservation. Decision-theoretic approaches provide a framework to determine the best sequence of actions in face of uncertainty, but only when that uncertainty can be meaningfully quantified [1]. Over the last four decades (beginning with Clark [2], Clark [3] and Walters & Hilborn [4]), dynamic optimization methods, particularly stochastic dynamic programming (SDP), have become increasingly important as a means of understanding how to manage human intervention into natural systems. Simultaneously, there has been increasing recognition of the importance of multiple steady states or ‘tipping points’ [5–7] in ecological systems.

We develop a novel approach to address these concerns in the context of fisheries; although the challenges and methods are germane to other problems of conservation or natural resource exploitation. Economic value and ecological concern have made marine fisheries the crucible for much of the founding work on management under uncertainty [2,3,8–11].

Even if we know the proper deterministic description of the biological system, there is intrinsic stochasticity in biological dynamics, measurements and implementation of policy (e.g. [10,12–14]). We may also lack knowledge about the parameters of the biological dynamics (parametric uncertainty, e.g. [11,15–17]), or even not know which model is a proper description of the system (structural uncertainty, e.g. [18–20]). Of these, the latter is generally the hardest to quantify. Typical approaches confront the data with a collection of models, assuming that the true dynamics (or reasonable approximation) is among the collection and then use model-choice or model averaging to arrive at a conclusion [18–20]. Even setting aside other concerns [19], these approaches are unable to describe uncertainty outside the observed data range.

Structural uncertainty is particularly insidious when we try to predict outside of the range of observed data [21] because we are extrapolating into unknown regions. In management applications, this extrapolation uncertainty is particularly important because: (i) management involves considering actions that may move the system outside the range of observed behaviour, and (ii) the decision tools (optimal control theory, SDP) rely on both reasonable estimates of the expected outcomes and on the weights given to those outcomes (e.g*.* [22]). Thus, characterizing uncertainty is as important as characterizing the expected outcome.

Tipping points in ecological dynamics [5,7] highlight this problem because precise models are not available and data are limited such as around high stock levels or an otherwise desirable state. With perfect information, one would know just how far a system could be pushed before crossing the tipping point, and management would be simple. But we face imperfect models and limited data and, with tipping points, even small errors can have very large consequences, as we shall illustrate later. Because intervention may be too late once a tipping point has been crossed (but see Hughes *et al*. [23]), management is often concerned with avoiding tipping points before any data about them are available.

The dual concerns of model uncertainty and incomplete data create a substantial challenge to existing decision-theoretic approaches [24]. We illustrate how SDP [25,26] can be implemented using a Bayesian non-parametric (BNP) model of population dynamics [27]. The BNP method has two distinct advantages. First, using a BNP model sidesteps the need for an accurate model-based description of the system dynamics. Second, a BNP model better reflects uncertainty when extrapolating beyond the observed data. This is crucial to providing robust decision-making when the correct model is not known (as is almost always true). (We use *robust* to characterize approaches that provide nearly optimal solutions without being sensitive to the choice of the (unknown) underlying model.)

This paper is, to our knowledge, the first ecological application of the SDP without an *a priori* model of the underlying dynamics. Unlike parametric approaches that can only reflect uncertainty in parameter estimates, the BNP method provides a broader representation of uncertainty, including uncertainty beyond the observed data. We will show that Gaussian process dynamic programming (GPDP) allows us to find robust management solutions in face of limited data without knowing the correct model structure.

For comparisons, we consider the performance of management based on GPDP against management policies derived under several alternative parametric models [10,11,25]. Rather than compare models in terms of best fit to data, we compare model performance in the concrete terms of the decision-maker's objectives.

## 2. Approach and methods

We first describe the requirements of dynamic optimization for the management of human intervention in natural resource systems. After that we describe three parametric models for population dynamics and the Gaussian process (GP)^{1} description of population dynamics. All computer code used here has been embedded in the manuscript sources (see Xie [28]), and an implementation of the GPDP approach is provided as an accompanying R package. Source code, R package and the data files corresponding to each figure are archived in the electronic supplementary material [29].

### (a) Requirements of dynamic optimization

Dynamic optimization enquires characterizing the dynamics of a state variable (or variables), a control action and a value function. For simplicity, we consider only a single-state variable. This is a best-case scenario for the parametric models because we simulate underlying dynamics from one of the three parametric models, whereas in the natural world we never know the ‘true’ model. In addition, by choosing one-dimensional models with just a few parameters, we limit the chance that poor performance will be owing to inability to estimate parameters accurately, something that becomes a more severe problem for higher dimensional parametric models. Finally, the parametric models we consider are commonly used in modelling stock-recruitment dynamics or to model sudden transitions between alternative stable states.

We let *X*(*t*) denote the size (numbers or biomass) of the focal population at time *t* and assume that in the absence of take its dynamics are2.1where *Z*(*t*) is lognormally distributed process stochasticity [10] and **p** is a vector of parameters to be estimated from the data. We describe the three choices for *f*(*X*(*t*),**p**) in the next section.

The control action is a harvest or take, *h*(*t*), measured in the same units as *X*, at time *t*. Thus, in the presence of take, the population size on the right hand side of equation (2.1) is replaced by *S*(*t*) = *X*(*t*) − *h*(*t*).

To construct the value function, we consider a return when *X*(*t*) = *x*(*t*) and harvest *h*(*t*) = *h* denoted as the reward, *R*(*x*(*t*),*h*). For example, if the return is the harvest at time *t*, then *R*(*x*(*t*),*h*(*t*)) = min(*x*(*t*),*h*(*t*)). We assume that future harvests are discounted relative to current ones at a constant rate of discount *δ* and ask for the harvest policy that maximizes total discounted harvest between the current time *t* and a final time *T*. That is, we seek to maximize over choices of harvest , where the state dynamics are given by equation (2.1) and *E* denotes the expectation over future population states.

In order to find that policy, we introduce the value function *V*(*x*(*t*),*t*) representing the total discounted catch from time *t* onwards given that *X*(*t*) = *x*(*t*). This value function satisfies an equation of SDP [3,25,30,31]:2.2where expectation is taken over all possible values of the next state, *X*(*t* + 1), and maximized over all possible choices of harvest, *h*(*t*). That is, at time *t*, when population size is *x*(*t*) and harvest *h*(*t*) is applied, the immediate return is *R*(*h*(*t*),*x*(*t*)). When the sole source of uncertainty is the process stochasticity term *Z*, the expectation in equation (2.2) is equivalent to taking expectations over *Z*(*t*). That is2.3where the population size after the take is *x*(*t*) − *h*(*t*), which is then translated into *X*(*t* + 1) by equation (2.1) (i.e. we replace *X*(*t* + 1) by *Z*(*t*)*f*(*x*(*t*) − *h*(*t*)|**p**)).

When the parameters governing the dynamics are also uncertain, we take the expectation over the posterior distribution for the parameters2.4when the underlying population dynamics are unknown (the case of structural uncertainty), the function *f* itself is uncertain and the expectation for the next state includes uncertainty in *f* as well. That is2.5We consider the finite time problem with *T* = 1000, which we solve using the standard value iteration algorithm [25,30].

### (b) Parametric models

We consider three candidate parametric models for the population dynamics: the Ricker model, the Allen model [32] and the Myers model [33], equations (2.6)–(2.8). In all three, we let *K* denote the carrying capacity and *r* the maximum *per capita* growth rate. The Ricker model has two parameters and the right hand side of equation (2.1) is2.6The Allen model has three parameters:2.7where *X _{c}* denotes the location of the unstable steady state (i.e. the tipping point).

The Myers model also has three parameters:2.8where *θ* = 1 corresponds to Beverton–Holt dynamics and *θ* > 2 leads to Allee effects and multiple stable states.

The Ricker model does not lead to multiple steady states. The Allen model resembles the Ricker dynamics with an added Allee effect parameter [34], below which the population cannot persist. The Myers model also has three parameters and contains an Allee threshold, but unlike the Ricker model saturates at high population size. The multiplicative lognormal stochasticity in equation (2.1) introduces one additional parameter *σ* that must be estimated.

Because of our interest in management performance in the presence of tipping points, all of our simulations are based on the Allen model. The Allen model is thus the state of nature and is expected to provide the best-case scenario. The Ricker model is a reasonable approximation of these dynamics far from the Allee threshold (but lacks threshold dynamics), while the Myers model shares the essential feature of a threshold but differs in structure from the Allen model. Throughout, we refer to the ‘true’ model when the underlying parameters *are known without error*, and refer to the ‘Allen’ model when these parameters have been estimated from the sample data.

We consider a period of 40 in which data are obtained to estimate the parameters or the GP. This is long enough that the estimates do not depend on the particular realization, and longer times are not likely to provide substantial improvement. Each of the models is fitted to the same data (figure 1).

We inferred posterior distributions for the parameters of each model by Gibbs sampling (Gelman *et al*. [35] implemented in R [36] using `jags`, [37]). We choose uniform priors for all parameters of the parametric models (see the electronic supplementary material, tables S1–S3; R code provided). We show one-step-ahead predictions of these model fits in figure 1. We tested each chain for Gelman–Rubin convergence and results were robust to longer runs. For each simulation we also applied several commonly used model selection criteria (Akaike information criterion, Bayesian information criterion, deviance information criterion; see Burnham & Anderson [38]) to identify the best fitting model.

Additionally, we compute the maximum-likelihood estimate (MLE, as we will refer to this model in the figures) of the parameters for the (structurally correct) Allen model. Comparing this to using the posterior distribution of parameters inferred from Markov chain Monte Carlo (MCMC) for the same model gives some indication of the importance of this uncertainty in the dynamic programming.

### (c) The Gaussian process model

The core difference for our purpose between the estimated GP and the estimated parametric models is that the estimated GP model is defined explicitly in reference to the observed data. As a result, uncertainty arises in the GP model not only from uncertainty in the parameters, but is also increased in regions farther from the observed states, such as low population sizes in the example illustrated here. The estimated parametric models, by contrast, are completely specified by the parameters.

The use of GPs to characterize dynamical systems is relatively new [39] and was first introduced in the context ecological modelling and fisheries management in Munch *et al*. [40]. GP models have subsequently been used to test for the presence of Allee effects [41], estimate the maximum reproductive rate [42], determine temporal variation in food availability [43] and provide a basis for identifying model-misspecification [44]. An accessible and thorough introduction to the formulation and use of GPs can be found in Rasmussen & Williams [45].

A GP is a stochastic process for which any realization consisting of *n* points follows a multivariate normal distribution of dimension *n*. To characterize the GP, we need a mean function and a covariance function. We proceed as follows.

As before, we assume that the data *X*(*t*) are observed with process stochasticity around a mean function *g*(*X*(*t*)):2.9where *ɛ* are independent and identically distributed normal random variables with zero-mean and variance *σ*^{2}. Note that we have chosen to assume additive stochasticity. While we could consider lognormal stochasticity as in the parametric models, we make this choice to emphasize that the GP approach need not have structurally correct stochasticity to be effective.

In order to make predictions, we update the GP based on the observed set of transitions. To do so, we collect the time series of observed states into a vector of ‘current’ states, **X**_{obs} = {*X*(1), … ,*X*(*T*−1)} and a vector of ‘next’ states, **Y**_{obs} = {*X*(2), … ,*X*(*T*−1)}, where *T* is the time of the final observation. Conditional on these observations, the predicted next state, *g*(*X*_{p}), for any given ‘current’ state, *X*_{p} follows a normal distribution with mean *E* and variance *C* determined using the standard rules for conditioning in multivariate normals, i.e.2.10and2.11Here *l*_{n} is the *n* by *n* identity matrix (i.e. a matrix with ones down the diagonal and zeros (elsewhere) and *K* is the ‘covariance kernel’. The covariance kernel controls how much influence one observation has on another. In the present application we use the squared-exponential kernel which, when evaluated over a pair of vectors, say ** x** and

**, generates a covariance matrix whose**

*y**i*,

*j*th element is given by2.12so that

*ℓ*gives the characteristic length-scale over which correlation between two observations decays. See Rasmussen & Williams [45] for other choices of covariance kernels and their properties. Note that this simple formulation assumes a prior mean of zero. For the parameters we use inverse Gamma priors on both the length-scale

*ℓ*and

*σ*, thus for example2.13for the prior on

*ℓ*,

*α*= 10 and

*β*= 10. The prior on

*σ*,

*α*= 5 and

*β*= 5.

We use a Metropolis–Hastings MCMC (Gelman *et al*. [35]) to infer posterior distributions of the parameters of the GP (electronic supplementary material, figure S4, code in appendix). Since the posterior distributions differ substantially from the priors (electronic supplementary material, figure S4), most of the information in the posterior comes from the data rather than the prior belief.

### (d) The method of Gaussian process dynamic programming

We derive the harvest policy from the estimated GP by inserting it into a SDP algorithm. Given the GP posteriors, we construct the transition matrix representing the probability of going to each state *X*(*t* + 1) given any current state *X*(*t*) and any harvest *h*(*t*) (see the function `gp_transition_matrix()` in the provided R package). Given this transition matrix, we use the same value iteration algorithm as in the parametric case to determine the optimal policy. In doing so, the uncertainty in the future state under the GP, *X*(*t* + 1), includes both process uncertainty (based on the estimation of *σ*) and structural uncertainty of the posterior collection of curves.

## 3. Results

### (a) Parametric and Gaussian process models for population dynamics

To ensure our results are robust to the choice of parameters, we will consider 96 different scenarios. To help better understand the process, we first describe in detail the results of a single scenario.

All of the models fit the observed data rather closely and with relatively small uncertainty. In figure 1, we show the posterior predictive curves. The training data of stock sizes observed over time are points, overlaid with the step-ahead predictions of each estimated model using the parameters sampled from their posterior distributions. Compared with the true model most estimates appear to over-fit, predicting patterns that are actually due purely to stochasticity. Model selection criteria (table 1) penalize more complex models and show a preference for the simpler Ricker model over the models with alternative stable states (Allen and Myers). The electronic supplementary material provides details on the model estimates.

We show the mean inferred population dynamics of each model relative to the true model used to generate the data in figure 2, predicting the relationship between observed population size (*x*-axis) to the population size after recruitment the following year.

In addition to the raw data, the GP is conditioned on going through the point 0,0 without error. All parametric models also make this assumption. Conditioning on (0,0) is equivalent to making the assumption that the population is closed, so that once it hits 0 it stays at 0, despite the lack of any data in the observed sequence to justify this. This assumption illustrates how the GP can capture common-sense biology without having to assume more explicit details about the dynamics at low population numbers that have never been observed. If the population were not closed, one could repeat the entire analysis without this assumption. Unlike parametric models, the GP corresponds to a distribution of curves, of which this plot only shows the means. Uncertainty in the parameters of the GP (not shown) further widens the band of possible population sizes. In electronic supplementary material, figure S1, we show the performance of the models outside the observed training data.

Despite the similarities in model fits to the observed data, the policies inferred under each model differ widely (figure 3). Policies are shown in terms of target escapement, *S*(*t*) = *x _{t}* −

*h*. Under models such as this a constant escapement policy is expected to be optimal [10], whereby population levels below a certain size

*S*are unharvested, while above that size the harvest strategy aims to return the population to

*S*. Whenever a model predicts that the population will not persist below a certain threshold, the optimal solution is to harvest the entire population immediately, resulting in an escapement

*S*= 0, as seen in the true (correct form, exact parameters) model, the Allen model (correct form, estimated parameters) and the GP. Only the structurally correct model (Allen model) and the GP produce policies close to the true optimum policy.

In figure 4, we show the consequences of managing 100 replicate realizations of the simulated fishery under policies derived from each model. The structurally correct model under-harvests, leaving the stock to vary around its unfished optimum. The structurally incorrect Ricker model over-harvests the population past the tipping point consistently, resulting in the immediate crash of the stock and thus leads to minimal long-term catch.

The results across replicate stochastic simulations are most easily compared by using the relative differences in net present value realized by each of the models (figure 5). Although not perfect, the GPDP consistently realizes a value close to the optimal solution and avoids ever driving the system across the tipping point, which results in the near-zero value cases in the parametric models.

### (b) Sensitivity analysis

These results are not sensitive to the modelling details of the simulation. The GPDP estimate remains very close to the optimal solution (obtained by knowing the true model) across changes to the training simulation, scale of stochasticity, parameters or structure of the underlying model. In the electronic supplementary material, we consider both a Latin hypercube approach and a more focused investigation of the effects of the relative distance to the Allee threshold and the variance of process stochasticity.

The GPDP is only weakly influenced by increasing stochasticity or increasing Allee effects over much of the range (electronic supplementary material, figure S2). Larger *σ* or higher Allee levels make even the optimal solution without any model or parameter uncertainty unable to harvest the population effectively (e.g. the stochasticity is large enough to violate the self-sustaining criterion of Reed [10]).

## 4. Discussion

Simple, mechanistically motivated models offer the potential to increase our basic understanding of ecological processes [46,47], but such models can be both inaccurate and misleading when used in a decision-making framework. In this paper, we tackled two aspects of uncertainty that are common to many ecological decision-making problems and fundamentally challenging to existing approaches that largely rely on parametric models:

—we do not know the correct models for ecological systems; and

—we have limited data from which to estimate the model.

We have illustrated how the use of non-parametric methods provides more reliable solutions in the sequential decision-making problem.

### (a) Traditional model-choice approaches can be positively misleading

Our results illustrate that model-choice approaches can be absolutely misleading—by providing support to models that cannot capture tipping point dynamics because they have fewer parameters and the data are far from the tipping point. That is, when the data come from around the stable steady state, all the parametric models are approximately linear and approximately identical. Thus, it is intuitive that all model selection methods choose the simplest model. In a complex world, the result is suboptimal. But in a world that might contain tipping points, the result could be disastrous.

Many managers both in fisheries and beyond face a similar situation: they have not observed the population dynamics at all possible densities. A lack of comprehensive data at all population sizes, combined with the inability to formulate accurate population models for low population sizes in the absence of data, make this situation the rule more than the exception. Relying on parametric models and model-choice processes that favour simplicity ignores this basic reality. For a long time, Carl Walters (e.g. [4]) has argued that if we began by fishing any newly exploited population down to very low levels and then let it recover, we would be much better at estimating population dynamics and thus predicting the optimal harvest levels. While certainly true, this presents a rather risky policy in the face of potential tipping points. The GPDP offers a risk-adverse alternative.

### (b) Gaussian process dynamic programming population dynamics capture larger uncertainty in regions where the data are poor

Parametric models perform most poorly when we seek a management strategy outside the range of the observed data. The GPDP, by contrast, leads to a predictive model that expresses a great deal of uncertainty about the probable dynamics *outside* the range of the observed data, while retaining very good predictive accuracy *inside* the range. The management policy based on the GPDP balances uncertainty outside the range of the observed data against the immediate value of the harvest, and acts to stabilize the population dynamics in a region of state space in which the predictions are reliably reflected by the data.

Such problems are ubiquitous across ecological decision-making and conservation where the greatest concerns involve decisions that lead to population sizes that have never been observed and for which we do not know the response—whether this is the collapse of a fishery, the spread of an invasive, or the loss of habitat.

### (c) The role of the prior

Outside of the observed range of the data, the GP reverts to the prior, and consequently the choice of the prior can also play a significant role in determining the optimal policy. In the examples shown here we have selected a prior that is both relatively uninformative (owing to the broad priors placed on its parameters *ℓ* and *σ*) and simple (the choice of our covariance function, equations (2.12) and (2.13)). In practice, these should be chosen to confer particular biological properties. In principle, this may allow a manager to improve the performance of the GPDP by adding detail as is justified. For instance, it would be possible to use a linear or a Ricker-shaped mean in the prior without making the much stronger assumption that the Ricker is the structurally correct model [41]. One fruitful avenue of future research is identifying criteria to ensure the prior and the reward function are chosen appropriately for the problem at hand.

## Funding statement

This work was partially supported by NOAA-IAM grant to S.M. and Alec McCall and administered through the Center for Stock Assessment Research, a partnership between the University of California Santa Cruz and the Fisheries Ecology Division, Southwest Fisheries Science Center, Santa Cruz, CA and by NSF grant EF-0924195 to M.M. and NSF grant DBI-1306697 to C.B.

## Footnotes

↵1 We abbreviate Gaussian process as GP, which refers to the statistical model we use to approximate the population dynamics, and we use the term Gaussian process dynamic programming (GPDP), to refer to the

*use*of a GP as the underlying process model when solving a dynamic programming equation. Hence we will refer to the models as: GP, Ricker, Allen, etc., and the novel method we put forward here as GPDP.

- Received July 16, 2014.
- Accepted December 8, 2014.

- © 2015 The Author(s) Published by the Royal Society. All rights reserved.