## Abstract

We estimate the probable number of flowering plants. First, we apply a model that explicitly incorporates taxonomic effort over time to estimate the number of as-yet-unknown species. Second, we ask taxonomic experts their opinions on how many species are likely to be missing, on a family-by-family basis. The results are broadly comparable. We show that the current number of species should grow by between 10 and 20 per cent. There are, however, interesting discrepancies between expert and model estimates for some families, suggesting that our model does not always completely capture patterns of taxonomic activity. The as-yet-unknown species are probably similar to those taxonomists have described recently—overwhelmingly rare and local, and disproportionately in biodiversity hotspots, where there are high levels of habitat destruction.

## 1. Introduction

How many species there are in a taxon is an intrinsically interesting question (May 1988, 1990; Prance *et al*. 2000; Pimm 2001; Dirzo & Raven 2003). It also has important implications for conservation. Recently discovered species are in biodiversity hotspots (Myers *et al*. 2000)—places with high levels of habitat destruction. As-yet-unknown species are likely to be in the same places and so in danger of extinction, if indeed they are found before they go extinct. Estimating how many such species there are is an essential step in setting conservation priorities.

There are two questions in estimating a taxon's total number of species. Surprisingly, the first is how many unique species taxonomists have already described. There are considerable uncertainties in the estimates of such species. Only when these are resolved can one ask the second question of how many more species there are that are presently unknown.

The first question is one of synonymy—taxonomists give different names to the same species inadvertently. There have been several recent estimates of the currently known number of unique species of plants (Prance *et al*. 2000; Alroy 2002; Bramwell 2002; Paton *et al*. 2008), with the highest estimate twice the lowest one. Paton *et al*. (2008) found a consistent percentage of synonyms within each family and, taking that rate of synonymy into account, estimated 352 282 unique flowering plant names.

We use the World Checklist of Selected Plant Families (WCSP 2008), a unique and continuously updated synonymized world list of plants that the Royal Botanic Gardens, Kew supplied. It has resolved problems of synonyms, but for only some plant families and around 110 000 species of seed plants. We use GrassBase, a similar list for the roughly 10 000 species of grasses (Clayton *et al*. 2009).

We ask the second question for just the families in these synonymized checklists: how should one estimate the number of species remaining to be discovered? Previous estimates used scaling laws in food webs, abundance, body size, rarity and other methods to predict the total number of species in various taxa (May 1988, 1990, 1992). More recent attempts employ differing methods of extrapolation of the number of species described over time, with the expectation that the number of new species per time interval in a taxon will decline as the pool of unknown species diminishes (Solow & Smith 2005; Wilson & Costello 2005). Generally, they do not. In one study, New World grasses showed a consistent increase in the number of new species over time (Bebber *et al*. 2007)! We shall show that this pattern is indeed a common one.

We find previous attempts wanting because none includes the number of taxonomists involved in describing species. The number of plant taxonomists active in any period (which we will define) has increased steadily over the 250 years of taxonomic history, a trend probably true of other taxa too. Not surprisingly, the raw number of species described over time has increased as well. By analogy to fishing statistics, one scales raw fish catches by the effort taken to acquire them to obtain ‘catch per unit effort’ as a measure of stock size. Here, we model the rate at which taxonomists ‘catch’ previously unknown species.

Our model has two factors. First, the greater the effort—the number of taxonomists involved in describing species—the more species they will describe in a given interval, other things being equal. We define ‘taxonomists’ simply as those who describe new species. Taxonomic effort is a powerful predictor of the number of species described.

Second, taxonomists have probably increased the efficiency of their efforts since the mid-1700s. That was when Linnaeus introduced the system of binomial nomenclature and founded modern taxonomic practice by providing as complete an account of all known species as he could. By ‘taxonomic efficiency’ we mean simply an increase in the number of species described per taxonomist, adjusted for the continually diminishing pool of as-yet-unknown species. Not all the taxonomists we polled (see below) thought taxonomic efficiency had increased. Were efficiency to have remained constant, the number of species described per taxonomist would decline continuously over time as the supply of undescribed species dwindled. We will show that for many taxa there is an increase in the number of species per taxonomist, typically for a century or so.

Finally, there are other confounding issues, also inspired by fishing analogies, to which we shall return.

## 2. An approach using taxonomic effort

The WCSP, together with GrassBase, present synonymized checklists of monocots, a monophyletic clade that includes approximately 20 per cent of all known flowering plants. These lists give a total count of 69 323 species of monocots. The WCSP checklist of the remaining flowering plants is less complete. We consider a total of 49 481 species that constitute less than a fifth of these non-monocot families.

For each 5-year interval, we calculate the number of unique species discovered and the number of taxonomists working. We expect the number of species described in interval *S*_{i} to depend on the number of taxonomists *T*_{i} actively describing species during that period,
2.1

Our model consists of two elements. The first is the remaining number of species to be described, *S*_{R}. It is the total number of species, *S*_{T}, minus the cumulative number of species already described, ∑*S*_{i} up to the given year, *t*

We chose 1760 as the start date to avoid the undue influence of Linnaeus's seminal work *Species plantarum* (Linné 1753).

The second element is taxonomic efficiency, *E*. We assume that taxonomists have become more effective at finding and describing species now than in the past. For simplicity, we assume that this increase in efficiency increases linearly over time:
2.3
where *a* and *b* are estimated parameters. Efficiency need not increase, whereupon *b* would be zero. All things being equal, *S*_{i}/*T*_{i} will decrease as the number of species still to be discovered declines. Also, *S*_{i}/*T*_{i} will increase over time as efficiency increases, so the exact form will depend on the product of efficiency and species remaining,
2.4

From this it follows that 2.5

This is an intrinsically nonlinear statistical model, because there are four independent variables in the complete expression,
2.6
but only three parameters to be estimated: *S*_{T}, *β*_{1} and *β*_{2}, *ɛ*_{i} are the residuals.

The number of species described per period tends to be ‘spiky’, indicating the undue influence of monographs that describe many species in the year they appear followed by intervals when taxonomists described relatively fewer species. For obvious reasons, as the number of taxonomists increases, the influence of individual monographs declines and the relationship becomes smoother. To normalize the residuals, we took the logarithms of observed (*S*_{i}) and predicted (*S*_{T} *β*_{1} *T*_{i} + *β*_{2} *S*_{T} *T*_{i} *Y*_{i}−*β*_{1} *T*_{i} ∑*S*_{i}−*β*_{2} *T*_{i}*Y*_{i}∑*S*_{i}) numbers of species, and minimized the sums of squares of their differences. We used a grid search followed by a steepest-descent method to find values of the three parameters that minimized this sum of squares.

This logarithmic transformation creates large residuals when the numbers of species are very small, as they were in the mid-1700s. If at least 40 species had not been described by 1760, we started in the first 5-year period where the cumulative number of known species was 40 or more.

Our model does not permit estimates of confidence intervals based on parametric statistics. We can estimate the certainty of our estimates in two ways. First, we used a standard jack-knife procedure iteratively removing data from one 5-year interval at a time and successively returning the previously removed data. This procedure provided 47–50 different predicted total species estimates, depending on the taxon and the year in which the cumulative number of species was more than 40. We report their minima and maxima. Second, we re-ran the entire analysis using 10-year intervals, obtaining similar results to those reported here.

## 3. Results

### (a) Overall estimates of diversity

For monocots (figure 1*a*), there is a broad increase in the number of species described per interval over time. The scale is logarithmic. The decline since 2005 represents incomplete data. Clearly, any method based simply on the number of species would conclude that there is no diminution of the pool of as-yet-unknown species. Figure 1*a* also shows the increasing number of taxonomists active in any period—essentially an exponential increase (linear on the figure's scale) since about 1800. There are dips in both numbers from the 1920s until the 1960s. Figure 1*b*,*d* shows the number of species described per taxonomist plotted on an arithmetic scale. These decline continuously over time.

For selected non-monocots, the number of species described per period increases until about 1850 and then remains roughly constant (figure 1*c*). The number of taxonomists again increases roughly exponentially. The number of species per taxonomist increases for about a century then declines steadily.

We estimate there should be an increase of 17 per cent in the number of species of monocots (range 13–18% using the jack-knife procedure; table 1). For the selected non-monocots in the database, the number of species should increase by 13 per cent (range 11–14% using the jack-knife procedure; table 1). These estimates broadly compare with Prance *et al*. (2000), who independently arrived at an estimate of 20 per cent.

### (b) Family-by-family results

We analysed individually all taxonomically complete families containing more than approximately 500 species. As an example, for orchids (figure 2*a*,*b*), the number of species per taxonomist increases very slightly then clearly decreases over time. The ‘spike’ represents the work of Rudolf Schlechter who, at his peak, described over 400 species per year between 1911 and 1913 (Schlechter 1911–1914; WCSP 2008).

For irises (figure 2*c*,*d*) in the late 1700s, large numbers of showy South African species were discovered and brought to Europe. Since 1800, the number of species per taxonomist has *increased* slowly and so our model does not provide a sensible estimate of the number of unknown species.

Table 1 shows the results for 17 taxonomically complete families of monocots presented in order of decreasing numbers of species. These families contain more than 93 per cent of all monocot species. Between 11 (Orchidaceae; range 9–12%) and 68 per cent (Eriocaulaceae; range 52–204%) more species remain to be discovered in each family.

We label the estimate for families where our estimate is more than three times the number of known species as ‘failing to converge.’ Four families did not provide sensible estimates.

There are 15 families in the WCSP database other than monocots that have more than 500 species (table 1), constituting 96 per cent of the species in the dataset we used. Ten of 15 families provided sensible estimates. The six families with the greatest numbers of species constitute 75 per cent of the species we model, and for them we predict increases from 20 per cent (Euphorbiaceae; range 16–24%) to more than twice the presently known number (Phyllanthaceae). These six families suggest a much higher number of unknown species than the 13 per cent we estimate for the group as a whole. That a subset of families provides different overall estimates than all families combined may seem contradictory, yet it reflects increasing specialization by taxonomists over time (see the electronic supplementary material).

### (c) How do our results compare with expert opinion?

In our second approach, we polled botanical colleagues for their estimates of how many species would eventually be described. We obtained estimates for 18 families this way (table 1). Their overall average—a 15 per cent increase in the present number of species—fits well with our model estimates. For three families, experts used a slightly different number of known species than in the catalogues we used above. For Poaceae, the expert provided a number of known species differing substantially from our tally.

For 11 of 18 families, expert opinion broadly matches the results of a quantitative modelling (table 1). In contrast, for three families (Iridaceae, Apocynaeae, and Chrysobalanaceae) where our estimates failed to provide sensible estimates, experts suggested that few species remain unknown (4%, 14% and 13%, respectively). How can we reconcile these opinions of few remaining unknown species with data showing either no decreases or sometimes even slight *increases* over time in the number of species described per taxonomist? By analogy to fishing catch-per-unit effort statistics, some families might have near-constant species per taxonomist ratios for decades—suggesting a large supply of unknown species—but then decline rapidly and unexpectedly as the ‘stock’ of such species is quickly exhausted.

Goldblatt justified his expectation that Iridaceae will be complete in about 5 years despite the generally *increasing* rate of species described per taxonomist over time (P. Goldblatt 2009, personal communication). The family is horticulturally desirable and has been deliberately targeted thoroughly in its known centres of diversity. Relatively poorly known areas, such as the wet tropics, hold few species. His work has been to revise genus after genus. He records that he is close to the end of genera that could be usefully revised and writes that ‘additions will just come to an abrupt end in the next 3–5 years.’ We will explore more complex models incorporating the taxonomic completion of subsets of plant families elsewhere.

## 4. Discussion

To summarize, the number of presently unknown plant species is thought to be 10 to 20 per cent of the number of known plant species. Approximately 13 per cent of the species in these synonymized data have been described since 1990. Of those, approximately 90 per cent are known from only one of the 300 or so regions into which the WCSP divides the world. Certainly, time may uncover other locations for these species, but that trend is balanced by the fact that, if the species were widespread, taxonomists would probably have found them earlier (Collen *et al*. 2004).

Overwhelmingly, the locations of these recent discoveries are critically imperilled—as are the species themselves (Mabberley 2009 provides an exception). Of the species found since 1990 that occur in only one region, almost 80 per cent inhabit biodiversity hotspots (Myers *et al*. 2000). These areas have many endemic species, by definition. Our results suggest that their numbers will increase further. Also by definition, these areas also have exceptionally high levels of habitat loss. Simply, unknown species are nearly all likely to be rare and in rapidly shrinking habitats, and hence likely to be deemed ‘threatened’ when taxonomists do describe them.

Brummitt *et al*. (2008) suggest that 20 per cent of known plant species are threatened. If we take this estimate, then add to that our result that there are 10 to 20 per cent more unknown species that are also likely to be threatened, then 27 to 33 per cent of all plant species are probably threatened. These estimates are based on immediate threat, and do not consider further development of destructive factors—including climate disruption (Pimm 2009)—during the remainder of this century.

## Acknowledgements

The authors thank Peter Raven for suggesting the questions this paper asks, extensive comments on the manuscript, strongly held views on the primacy of expert opinions and assistance in providing them. Royal Botanic Gardens, Kew provided access to and assistance with the WCSP. R. Govaerts and A. Paton allowed access to the World Checklist of Selected Plant Families and provided useful discussions on the data used in this paper. Alex Davies extracted the data and D. Simpson provided helpful discussions on GrassBase. We thank taxonomic experts P. J. Cribb, K. Wurdack, R. Soreng, B. Simon, D. Simpson, W. Thomas, P. Goldblatt, R. B. Faden, H. Kennedy, P. Berry, P. Manos, G. Prance, A. Davis, T. Pennington, A. Paton, S. Mayo, A. Henderson, J. Kress, D. Goyder and M. Sands for their expert opinions. J. Lucas and G. Russell provided statistical advice.

## Footnotes

- Received May 12, 2010.
- Accepted June 17, 2010.

- © 2010 The Royal Society

This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.