## Abstract

Early warning signals have been proposed to forecast the possibility of a critical transition, such as the eutrophication of a lake, the collapse of a coral reef or the end of a glacial period. Because such transitions often unfold on temporal and spatial scales that can be difficult to approach by experimental manipulation, research has often relied on historical observations as a source of natural experiments. Here, we examine a critical difference between selecting systems for study based on the fact that we have observed a critical transition and those systems for which we wish to forecast the approach of a transition. This difference arises by conditionally selecting systems known to experience a transition of some sort and failing to account for the bias this introduces—a statistical error often known as the prosecutor's fallacy. By analysing simulated systems that have experienced transitions purely by chance, we reveal an elevated rate of false-positives in common warning signal statistics. We further demonstrate a model-based approach that is less subject to this bias than those more commonly used summary statistics. We note that experimental studies with replicates avoid this pitfall entirely.

## 1. Introduction

Mathematics … while assisting the trier of fact in the search of truth, must not cast a spell over him. ([1], p. 320)

In the case of *People v. Collins* 1968, the California Supreme Court considered the evidence of an expert witness described by the court as ‘an instructor of mathematics at a state college’, which concluded that the probability that a randomly selected individual would match the description given by the victim would be less than 1 in 12 million [1]. The prosecution had produced an individual matching the prosecutor's detailed description, and convinced by the mathematics, the lower courts had found him guilty.

The prosecution has only observed that the probability of seeing the evidence (*E*) they produced given a random innocent individual (*I*), is very small. From this, one cannot conclude that the individual is indeed guilty, that is, that the probability the individual is innocent given the evidence is also very small. In a city with millions of people, there might be several individuals who match the description of the evidence. Mathematically, need not equal ; instead these expressions are related by Bayes theorem,
1.1 and , so , and consequently we cannot conclude that from . Realizing this mistake, the California Supreme Court reversed the decision, and the case became a widely recognized example of the prosecutor's fallacy [2]. Here, we explore how a similar misconception can arise from the use of historical data to evaluate methods for detecting early warning signals of critical transitions.

Catastrophic transitions or tipping points, where a complex system shifts suddenly from one state to another, have been implicated in a wide array of ecological and global climate systems such as lake ecosystems [3], coral reefs [4], savannah [5], fisheries [6] and tropical forests [7]. Recent research has begun to identify statistical patterns commonly associated with these sudden catastrophic transitions, which could be used as an *early warning sign* to identify an approaching tipping point, which might provide managers time to react to and avert an undesirable state shift [8,9]. An array of statistical patterns associated with tipping point phenomena has been suggested for the detection of early warning signals associated with such sudden transitions. Two of the most commonly used are a pattern of increasing variance [10] and a pattern of increasing autocorrelation [11], which have been tested in both experimental manipulation [3,12–14] and historical observations [15–20].

### (a) Testing patterns on historical data

Historical examples of sudden transitions taken from the paleo-climate record provide an important way to test and evaluate potential leading indicator methods, and have been widely used for this purpose [15–20]. Similarly, it has been suggested that data gathered from ecological systems such as lakes that were monitored before they experienced sudden eutrophication, or grasslands subjected to overgrazing could contain data that could help reveal when similar systems are approaching a tipping point [3].

However, testing methods for early warning signals against historical examples of transitions is susceptible to statistical mistakes that arise from selecting data conditional on that data having already exhibited a sudden transition. A central tenant of early warning theory is that the system in question is slowly approaching a tipping point that lies some unknown distance away. If nothing is done to remedy the situation, this slow change will inevitably carry the system beyond the tipping point, which introduces a sudden, rapid transition into an undesirable state [8]. This process can be described mathematically as a *bifurcation*, in which a slowly changing parameter reaches a critical value that causes the system stability to change.

Not all sudden transitions are caused by some ‘guilty’ process slowly driving the system over a tipping point—the kind of process that early warning signals are designed to detect. Some systems may experience such transitions purely by chance, leaving a stable state on an extremely unlikely excursion that happens to stray to far from the stable attractor [9,18], consider this possibility in transitions that arise from analysing historical climate record. Like the evidence presented before the California Supreme Court in 1968, the chance of observing such an ‘innocent’ transition a priori may be very small, but when selected from a historical record of many possible transitions, this possibility can no longer be ignored.

Figure 1 shows a schematic illustrating critical transitions under each of these scenarios. In figure 1*a*, the system experiences a bifurcation and should contain an early warning signal. In figure 1*b*, a similar-looking trajectory emerges from a simulation of a stable system that should not contain a warning signal. While the simulation of the bifurcation scenario shown on the left produces a similar transition every time, the transition shown on the right is somewhat less probable, occurring in only one per cent of simulations.

## 2. Methods and results

To investigate whether early warning signals are vulnerable to this fallacy, we simulate a system that is not driven towards a bifurcation such as in figure 1*b*. This simulation approach allows us to determine whether examining historical events is a valid way to test the utility of these indicators. We simulated 20 000 replicates of a stochastic individual-based birth–death process with an Allee threshold [21], which arises from positive fitness effects at low densities. Above the Allee threshold, the population returns to a positive equilibrium size, whereas below the threshold the population decreases to zero. The model can be represented as a continuous time birth–death process where births and deaths are Poisson events that depend on the current density with rates given by
2.1and
2.2a model with a linear death rate and density-dependent birth rate that drives the Allee effect at low densities and limits growth at high densities. In this model, *n* indicates the discrete number of individuals in the population, *K* indicates a carrying capacity as set by a limiting resource, *e* a per-capita death rate (the *e* scaling term in the birth equation allows the carrying capacity *K* to correspond to a positive equilibrium point), *a* an additional mortality imposed on the population such as harvest, *h* is a parameter controlling at what population size the addition of more individuals switches from conferring a positive benefit on growth from Allee interactions *n* < *h* to a negative impact on growth owing to increased competition, *n* > *h*. The key feature of this model is the alternate stable states introduced by this effect; other functional forms for equation (2.1) could serve equally well for these simulations [22]. Although this system can be forced through a bifurcation by increasing the death rate, in these simulations, all parameters are held constant and no bifurcation occurs. Consequently, we do not anticipate an early warning signal of an approaching bifurcation.

The simulation starts from the positive equilibrium population size. Although the chance of a transition across the Allee threshold in any given time step is small, given enough time, this system will eventually experience such a rare event, driving the population extinct. We ran each replicate over 50 000 time units, sampling the system every 50 time units. In this time, window 266 of the 1000 replicates experience population collapse. To keep the examples of comparable sample size, we focus on a section of the data 500 time points prior to the system approaching the transition.

To test whether selecting systems that have experienced spontaneous transitions could bias the analysis towards false-positive detection of early warning signals (the prosecutor's fallacy), we selected replicates conditional on having collapsed in the simulations. We then selected a window around each system that ended just before the collapse, while the population values were still above the Allee threshold. For each replicate, we calculated the most common early warning indicators, variance and autocorrelation [8,10,16], around a moving window equal to half the length of that time series.

To test for the presence of a warning signal in these indicators, we computed values of Kendall's *τ* for both indicators for each of the 266 replicates. Kendall's *τ* is a non-parametric measure of rank correlation frequently used to identify an increasing trend (*τ* > 0 in early warning signals [16,23], defined as in *n* observations.^{1} *τ* takes values in (−1,1). The distribution of *τ* values observed across these replicates is shown in figure 2. We compare the distribution of *τ* from all the simulations to the distribution conditioned on experiencing a chance transition to the alternative stable state. To avoid an effect of sample size, the time series are all chosen to be of the same length.

To demonstrate that the effect we observe is not unique to models with Allee effects, we provide an example of the effect arising in a discrete-time model with two non-zero stable states adapted from [4], 2.3which combines a logistic growth model with a saturating predator response (see [24] for detailed discussion), shown in figure 3. Code to replicate the analysis can be found at https://github.com/cboettig/earlywarning/tree/prosecutor/.

For each of these replicates, we also take a model-based approach, estimating parameters for an approximate linear model of the system approaching a saddle node bifurcation, as described by Boettiger & Hastings [25], 2.4

In this model, the parameter *m* describes the approach towards the saddle-node bifurcation. Estimates *m* < 0 are expected in systems approaching a bifurcation, while for stable systems, *m* should be approximately zero. None of the estimates across the 266 simulations differed from zero in our study; hence the model-based estimation shows no evidence of bias on data that has been selected conditional on collapse.

## 3. Discussion

The attempts to detect early warning signs for critical transitions are based on the concept of a deteriorating environment as embodied in a changing parameter [8], which is a different kind of transition than one that is driven instead by stochasticity in an environment that is otherwise constant and exhibiting no directional change. When trying to use historical data to understand critical transitions, we often do not know which category, changing environment or simply chance, an observed large change falls into.

We have shown here that systems that undergo rare sudden transitions owing to chance look statistically different from their counterparts that do not, even though they are driven by the same stochastic process. In particular, such conditionally selected examples are more likely to show signs associated with an early warning of an approaching tipping point, such as increasing variance or increasing autocorrelation, as measured by Kendall's *τ*. This increases the risk of false positives—cases in which a warning signal being tested appears to have successfully detected an underlying change in the system leading to a tipping point, when in fact the example comes instead from a stable system with no underlying change in parameters. Figure 2 shows that many of the chance crashes show values of *τ* that are significantly larger than those observed in the otherwise identical replicates that did not experience a chance transition, thus ‘detecting’ an underlying change in the system dynamics that is not in fact present.

### (a) Chance transitions are false positives for early warning signals

It seems tempting to argue that this bias towards positive detection in historical examples is not problematic—each of these systems did indeed collapse; so the increased probability of exhibiting warning signals could be taken as a successful detection. Unfortunately, this is not the case. At the moment the forecast is made, these systems are not likely to transition, because they experience a strong pull towards the original stable state. A closer look at the patterns involved shows why common indicators such as autocorrelation and variance can be misleading.

As the system gets farther from its stable point, it is more likely to draw a random step that returns it towards the stable point. Despite this, there is always some probability that it will move further still; so systems that do cross the tipping point must do so rather quickly by a string of events. This pattern, clearly visible before the crashes in each of the examples in figure 1, produces a string of observations that appear more highly autocorrelated (if we are sampling the system frequently enough to catch the excursion at all) than we observe in the rest of the fluctuations around the equilibrium. Yet, this autocorrelation comes from a chance trajectory moving quickly *away* from the stable state, not from the critical slowing down pattern in the return times to the stable state that precede a saddle-node bifurcation and motivate the early warning signal.

This longer than expected excursion results in a higher than expected variance in that window as well. Both variance and autocorrelation are calculated using a moving window over the time-series, which allows the method to pick out a pattern of change as the window moves along the sequence. If this chance excursion that precedes the crash happens to fill a significant part of the moving window, the resulting pattern will tend to show an increase in autocorrelation or variance. If the chance excursion is relatively rapid compared with the frequency at which the system is observed (spacing of the data) or the width of the moving window, the excursion may not significantly alter the general pattern. In this way, some of the events in which a crash is observed will appear to present these statistical patterns of increased variance or autocorrelation without being harbingers of approaching critical transitions.

### (b) The truncation of observations

If we had a complete knowledge of the system dynamics, then we could eliminate the bias we observe here because the bias arises from the transient branch of the trajectory that crosses the threshold, and if the system were truncated at the minimum of the potential, then the effect we emphasize here would not appear. But, it is not possible to truncate the system in any practical application. The precise location of the minimum of the potential (the location of the deterministic equilibrium) is unknown. Moreover, under the hypothesis that the system is approaching a critical transition, the location of the minimum potential moves; so it cannot easily be estimated by previous observations, (see figure 1*c* where the equilibrium point moves in the direction of the transition). Thus, it is neither practical nor desirable to suggest that historical time series can be used by following a simple truncation rule that avoids the branch of a trajectory crossing the threshold to another basin of attraction. Exactly where a particular study will choose to truncate such a trajectory will necessarily be arbitrary without an underlying model of the process. Frequently this is done by removing the very steep, monotonic branch of the trajectory expected, once the system crosses the unstable threshold. Such an approach corresponds with our choice of termination and produces the bias we discuss here.

The examples of figure 1, though only single replicates, may be useful in illustrating these issues. Figure 1*c*, top panel shows a sample trajectory of a system with a parameter shift, while 1b shows a trajectory without a shift. Both trajectories become more highly autocorrelated and higher variance near the end of the time series (time increases on the *y*-axis in figure 1). The part of the time series following the critical transition shows a fast and monotonic trajectory to the unstable trajectory, and would usually be excluded by an analysis for warning signals in advance of the transition. No such clear pattern exists prior to the transition in figure 1*b*. An alternative proposal to terminate the trajectory in (*b*) earlier would also risk decreasing the signal seen in (*c*), and would be inconsistent with the application of warning signals in the forecasting context, where there would be no such truncation.

### (c) Comparing to the model-based method

In our numerical experiment, the model-based estimate of early warning signals appears more robust than the summary statistics, producing the same estimates on both the conditionally selected replicates as on a random sample of the replicates. This is a consequence of the more rigid specifications that come with a model-based approach—the pattern expected is less general than any increase in variance or autocorrelation, but instead must be one that matches its approximation of the saddle-node bifurcation. This observation highlights the difference between the pattern driving the false positive trends in increasing variance and increasing autocorrelation and the pattern anticipated in the saddle-node model. This should not however be taken as evidence that the model-based approach is immune to the bias of the prosecutor's fallacy.

### (d) Importance of experimental approaches

The problem we highlight ultimately stems from the difficulty of having only a single realization with which to examine a complex problem. The only way to deal with this problem embodied is through replication, as can be done in an experimental system in laboratory manipulations such as Drake & Griffen [12], Veraart *et al.* [13] and Dai *et al.* [14] and at the scale of whole lake ecosystems in Carpenter [3]. Experimental procedures avoid the hazard of the prosecutor's fallacy by generating a complete sample of replicates, rather than by selecting a subset of cases from some larger historical sample.

## Acknowledgements

This research was supported by funding from NSF grant EF 0742674 to A.H. and a Computational Sciences Graduate Fellowship from the Department of Energy grant DE-FG02-97ER25308 and NERSC Supercomputing grant DE-AC02-05CH11231 to C.B. The authors thank M. Baskett, T. A. Perkins and N. Ross for helpful comments on earlier drafts of the manuscript, and also P. Ditlevsen and an anonymous reviewer for their comments.

## Footnotes

↵1 A pair of observations (

*x*,_{i}*y*) and (_{i}*x*,_{j}*y*) are concordant if_{j}*x*>_{i}*x*and_{j}*y*>_{i}*y*or_{j}*x*<_{i}*x*and_{j}*y*<_{i}*y*and discordant otherwise; equalities excepted._{j}

- Received September 5, 2012.
- Accepted September 20, 2012.

- This journal is © 2012 The Royal Society