Mass media and popular science journals commonly report that new fossil discoveries have ‘rewritten evolutionary history’. Is this merely journalistic hyperbole or is our sampling of systematic diversity so limited that attempts to derive evolutionary history from these datasets are premature? We use two exemplars—catarrhine primates (Old World monkeys and apes) and non-avian dinosaurs—to investigate how the maturity of datasets can be assessed. Both groups have been intensively studied over the past 200 years and so should represent pinnacles in our knowledge of vertebrate systematic diversity. We test the maturity of these datasets by assessing the completeness of their fossil records, their susceptibility to changes in macroevolutionary hypotheses and the balance of their phylogenies through study time. Catarrhines have shown prolonged stability, with discoveries of new species being evenly distributed across the phylogeny, and thus have had little impact on our understanding of their fossil record, diversification and evolution. The reverse is true for dinosaurs, where the addition of new species has been non-random and, consequentially, their fossil record, tree shape and our understanding of their diversification is rapidly changing. The conclusions derived from these analyses are relevant more generally: the maturity of systematic datasets can and should be assessed before they are exploited to derive grand macroevolutionary hypotheses.
Discoveries of new fossil species (and, more occasionally, living species) commonly elicit reports that evolutionary history will have to be rewritten in their light. Perhaps this is mere journalistic hyperbole, or perhaps it reflects a more significant truth: that evolutionary hypotheses are highly sensitive to the species they encompass, and that the addition of new taxa can lead to fundamental changes in the evolutionary narrative. We explore how the discovery of new species affects (i) our perception of the completeness of the fossil record, (ii) the distribution of taxa across the phylogeny and (iii) the perception of evolutionary history derived from these data.
Sampling theory predicts that the majority of new species are discovered rapidly and that the rate of discovery slows to an asymptote as sampling matures. Given this, it is expected that as species are discovered our knowledge of the fossil record of higher taxonomic groups should improve to the extent that new discoveries merely fill previously known gaps rather than identifying entirely new (and previously unknown) clades. Secondly, the phylogenetic distribution of newly discovered taxa should be random and so, again, existing phylogenetic patterns should be robust to the discovery of new taxa. Thirdly, in consequence, macroevolutionary hypotheses based on mature datasets should remain robust to continued discovery of species. However, how we can identify when a dataset has reached the level of taxonomic and stratigraphic maturity at which the phylogenetic and evolutionary hypotheses based upon it remain robust to further discoveries?
We employ measures of tree topology, fossil record quality and predictions of evolutionary phenomena to assess stability in the face of expanding taxonomic datasets. Our aim is to determine whether these measures can serve as assays for the maturity of sampling of the taxonomic and stratigraphic history of evolutionary lineages. We exploit two exemplary datasets—catarrhine primates and non-avian dinosaurs—to explore how our assays of dataset maturity perform in relation to the history of discovery of all currently known species in these clades. Both clades have attracted intense interest from zoologists and palaeobiologists over the past 200 years, forming the basis of numerous macroevolutionary studies [1–6], and they should in theory represent pinnacles of knowledge to which other datasets can be compared.
2. Material and methods
A catarrhine phylogeny of 153 extant and 131 fossil species was constructed by augmenting the phylogeny presented in  with additional taxa (figure 1; electronic supplementary material, figure S1). This method has been applied in part elsewhere [8–10] and justification for each node within the phylogeny is presented in the electronic supplementary material. The phylogeny represents the level of knowledge in 2007. Taxa were then pruned backwards in time in reverse order of the dates of naming of the species. This was done in 10-year intervals, producing a further 25 phylogenies representing each decade from 1760 to 2000. Twenty-six additional phylogenies were produced containing only those taxa with a known fossil record.
We used the supertree of non-avian dinosaurs presented in  and pruned it of species in 10-year time intervals, based on the species' date of description, resulting in 18 phylogenies representing decadal steps from 1840 to 2007. All taxa are extinct.
Each set of phylogenies was subjected to analyses of the completeness of their fossil records, tree balance and origination rates, to determine how perceptions of these phenomena have changed in light of increased sampling of the fossil record. It is important that, within the sets of pruned phylogenies, the relationships of the component taxa are unchanged. This does not allow us to address how perceptions of phylogeny have changed in response to the discovery of new species (in itself a major goal of palaeontology but untestable in this study); but in maintaining the phylogeny as a constant we can constrain the changes to the effect of fossil discovery rather than the rearrangement of known species.
(b) Changes in the quality of the fossil record through study time
The completeness of the fossil record can be assessed by the degree of congruence between the stratigraphic order in which species are encountered and the order in which they are arranged within a phylogenetic tree. Since sister lineages diverge contemporaneously from their common ancestor, they should appear at the same stratigraphic level if the fossil record is complete. Mismatches between the stratigraphic first occurrences of sister lineages are known as ghost lineages. The completeness of the fossil record across a tree can be considered in terms of the average ghost lineage, or as the preservation rate, calculated by dividing the number of taxa appearing after the oldest taxon by the total ghost range . A Poisson distribution, based on the 2007 dataset, was then placed around the observed preservation rate to provide a 95 per cent confidence interval. Points lying below this confidence interval imply time periods where our knowledge was statistically worse than it is today, and vice versa. We also calculated the Relative Completeness Index (RCI; ), a measure of the ratio between the known stratigraphic range of a group of fossils and their implied total ghost lineage, and as more fossils are discovered, we would expect this ratio to decrease.
(c) Changes in phylogenetic tree shape
Changes in tree shape can be measured using a corrected version  of Colless's index of phylogenetic tree imbalance (Ic) . This index can discriminate between random distributions of new taxa across the tree (little variation in Ic) and biased discovery of new taxa that inflate only parts of the tree (high variation in Ic through study time). Ic is strongly affected by tree size , and to account for this a simulation model was run to provide an expected mean value for randomly generated trees. To give a sample size-adjusted Ic value (Ic*), the observed Ic was subtracted from the mean expected Ic. These data were generated under an ERM-TI model  and so are suitable for use with fossil taxa.
(d) Identification of origination rate shifts
SymmeTREE  was used to identify origination rate shifts using the Δ2 shift statistic. Traditionally, SymmeTREE refers to diversification (speciation–extinction) rate shifts, but the inclusion of fossil taxa means that the effect of extinction is now excluded and so these shifts are more accurately referred to as origination rate shifts. These shifts are used as an exemplar of inferred macroevolutionary events due to their widespread use [4,6,20–22].
(a) The continued discovery of taxa
As our level of knowledge about a group improves, it is anticipated that the number of taxa discovered through study time will decrease from a high initial rate, levelling off into an asymptote as all taxa are discovered. This is the underlying principle of species accumulation curves long used in ecology  (and more recently in palaeontology [24–26]), and these were used to assess how the rate of taxon discovery has changed through research time.
For catarrhines, the rate has remained relatively constant, whereas for dinosaurs the curve is logarithmic, but large differences exist between different subsets of the data (electronic supplementary material, figure S2). Sampling of extant catarrhines can be considered essentially complete, as only seven living species have been discovered in the last 30 years. This contrasts with the 46 new fossil species identified over the same period. In comparison, the number of known dinosaur species is still increasing rapidly and shows no sign of levelling off . However, these data are strongly affected by the number of workers, sedimentary basins and geographic localities, and the majority of new species are being identified from previously incompletely explored regions and geological formations , and so these may eventually reach saturation.
(b) Changes in the quality of the fossil record through time
If our knowledge of the fossil record improves through research time, it is expected that new discoveries will fill gaps between the known taxa. Alternatively, if our knowledge is poor, then each new discovery might generate gaps, as new taxa are likely to lie outside known clades or stratigraphic ranges. Thus, we would expect to see the relative preservation rate increase as our knowledge increases.
From 1850, numerous fossil taxa were discovered in different catarrhine clades, and so the perceived quality of the catarrhine fossil record steadily decreases until 1920 (figure 2). The largest single decrease occurs when the base of the phylogeny is pushed back from 15 Ma to 35 Ma (using the current conception of geological time as a standard) in 1910 by the discovery of Propliopithecus haeckeli. Subsequent discoveries provided diminishing extensions to the known range of catarrhines, such as the discovery of Catopithecus browni in 1989, which extended the base of the phylogeny by a modest increment, to 36.5 Ma. Since 1920 there has been a steady increase in the observed preservation rate as new discoveries continued to fill known gaps, to a peak in 1990, and it has since decreased only slightly (5%).
For non-avian dinosaurs, a steady increase in the completeness of the fossil record quality is observed for almost all of their study period (figure 2). This is because one of the youngest dinosaurs, Troodon formosus (65 Ma), and one of the oldest, Euskelosaurus browni (228 Ma), were discovered very early in dinosaur research. Since then new discoveries have, as expected, filled previously known gaps up to 1990. From 1990, a sharp decrease of 18 per cent is observed in the preservation rate, much greater than that observed for catarrhines. This increase does not reflect any change in the oldest or youngest dinosaurs known. Rather, it represents an increase in the number of generic and family-level lineages (many of them poorly known), and substantial stratigraphic range extensions for certain lineages within the tree.
The RCI (electronic supplementary material, figure S3) mirrors the pattern observed for the average ghost lineage and is driven by the discovery of the same taxa. There is a decrease in the quality of the catarrhine fossil record up to 1920 and an improvement until 1990. Again, the decrease in the RCI of the dinosaurian record after 1990 (25.5%) is considerably larger than that shown by the catarrhine record (2.5%).
It might be expected that the quality of the fossil record and our understanding of the tree of life would improve as new taxa are discovered. The apparent decrease in estimated fossil record completeness is unexpected for both groups, but such a substantial decrease for non-avian dinosaurs is worrying. The perceived improvement for much of their study history represented an improvement in our knowledge of lineages indigenous to the European and North American continents. The relatively recent analysis of fossiliferous localities in China and South America has led to a decrease of 18 per cent in the average observed preservation rate per taxon. This decrease is not equally distributed across major subclades of dinosaurs (electronic supplementary material, figure S4): ornithischians show a continuing increase in their preservation rate while, from the year 2000, the preservation rate for theropods decreases by 13 per cent (though this lies within the 95% CI). The largest significant decrease is attributed to sauropodomorphs, whose perceived preservation rate fell by 28 per cent between 1990 and 2000, caused by the discovery of two early representatives of their clades (Bellusaurus sui in 1990 and Phuwiangosaurus sirindhornae in 1994). That two fossil discoveries could have such a profound effect upon perceived preservation rate reflects the sensitivity of this measure and the paucity of sampling of certain dinosaurian clades.
(c) Changes in phylogenetic tree shape through study time
One way to assess the importance of new discoveries is to see whether they are randomly spread across a phylogeny or clustered together to form new clades. If taxa were distributed randomly, not only would the phylogeny maintain its shape through time, it would also mean that these discoveries were at low taxonomic levels and sit nested within the previously known clades. However, if newly discovered taxa cluster to form entirely new clades at higher taxonomic levels, the overall shape of the phylogeny will probably change significantly through study time, with attendant changes to the pre-existing macroevolutionary hypotheses.
There are large differences through study time in the variation of Ic* between dinosaur and catarrhine lineages (figure 3). While the Ic* for catarrhines fluctuates around the expected value based on the ERM-TI model, the value for dinosaurs peaks very early and then begins to decrease towards the expected value, but this decrease is steady and shows no sign of abating. The slight changes in Ic* observed in catarrhines indicate that the overall shape of their phylogeny has changed little over the past 200 years. This indicates that all key subclades were identified early in research time, that new taxa have been randomly distributed across the whole phylogeny and that future discoveries are likely to be distributed in a similar manner. This is in contrast to the dinosaurs, whose phylogeny is rapidly changing shape and is becoming more balanced, implying that previously unknown clades are being identified, and so we may expect new discoveries to have a similarly strong effect. Again, large differences are observed between the sub-clades, with theropods showing the highest degree of stability and sauropodomorphs the least (electronic supplementary material, figure S5).
(d) The detection of origination rate shifts using tree topology
The catarrhine phylogeny as it is today shows three significant (p < 0.05) and five substantial (p < 0.1) origination rate shifts, while all shifts detected through the study period (1760–2007) are shown in the electronic supplementary material, figure S6.
Seventeen significant shifts and 10 substantial origination shifts were detected within the dinosaur phylogeny (electronic supplementary material, figure S6). One difference between the dinosaur and catarrhine data is that many of the shifts in the dinosaur phylogeny were detected considerably more recently in research time than those identified in catarrhines. In non-avian dinosaurs no shifts were identified prior to 1900, and by 1990 only 41 per cent of currently known shifts were identifiable, but this increases to 81 per cent during the following decade. Some of these shifts may also turn out to be transient, like those seen in catarrhine research time, while many more may emerge in the future as additional taxa are discovered.
Origination rate shifts were identified using the Δ2 value at individual nodes within the phylogeny. The mean Δ2 value for each phylogeny was calculated per decade, providing a picture of change through study time (figure 4). While the mean value of Δ2 for the catarrhine data has fluctuated around 0.075, the results for dinosaurs show no sign of any stability and continue to increase to above 0.2. This highlights the continuing high rate of discovery of new dinosaur species and the effect discovery has on our perception of origination rate shifts.
Our study outlines a variety of tests that can be readily conducted to assess the robustness of phylogenetic trees to the continued discovery of taxa. This approach allows researchers to distinguish between real biological phenomena and artefacts caused by incomplete taxonomic sampling.
Of the indices used, the preservation rate, Ic* and mean Δ2 values are the most appropriate for future use, as the other indices are refinements and variations on these three. Assessing the preservation rate makes it possible to track how perception of the quality of the fossil record has changed through study time. Steadily increasing preservation rate reflects a fossil record that better approximates evolutionary history, while a decreasing preservation rate reflects a poorly sampled fossil record where new discoveries will have a large impact upon perceived evolutionary history. Changes in Ic* provide a clear view of how new discoveries have fitted into the overall phylogenetic tree and whether they are having a strong effect, identifying new clades, or whether species discoveries are distributed randomly. Mean Δ2 values can provide an estimate of how origination rate shifts may change through study time with the future addition of taxa.
Overall, the rate of taxon accumulation has remained relatively constant through time; the decrease in the rate of discovery of extant taxa has been matched by an increase in the rate of discovery of fossil taxa. Although the phylogeny has gradually changed shape through study time, this change has occurred slowly, indicating that discoveries have been spread randomly. In the future we can expect discoveries to be similarly distributed across the phylogeny, and they are unlikely to have a substantial impact upon our perception of catarrhine evolutionary history. By 1920, a large number of fossil species had been discovered right across the phylogeny, but since then further discoveries have filled gaps within the phylogeny rather than creating new clades (figure 3). It could be argued that the large number of extant taxa discovered early in study time had a diluting effect on new fossil discoveries. However, this can be rejected since both the preservation rate (figure 2) and the RCI (electronic supplementary material, figure S3) were based solely on fossil taxa, while the analysis of tree shape (figure 3) showed that newly discovered fossil taxa were distributed randomly and had little effect on the stability of the phylogeny.
The current rate of taxon discovery shows no signs of abating and it is therefore pertinent to ask what effect these discoveries have on our perception of the evolutionary history of dinosaurs. If discoveries have a great effect then there is little justification for using the dinosaur fossil record as an exemplar for nomothetic macroevolutionary studies. Unfortunately, although the perceived completeness of the dinosaurian fossil record has improved greatly over the past 150 years, it is apparently now in a state of turmoil, with apparent decreases in quality within some subclades. The three main clades of dinosaurs exhibit differing levels of fossil record completeness and tree stability, indicating that some lineages are insufficiently known to establish material conclusions concerning their macroevolutionary history.
This study also has critical implications for both the marine vertebrate and invertebrate fossil records, which are considerably better than those of either catarrhine monkeys or non-avian dinosaurs. Here we have shown how analyses of dataset maturity can be conducted and that they are sensitive enough to differentiate between groups with distinct sampling histories, identifying groups with records that are sufficiently robust to provide a basis for large-scale macroevolutionary studies.
Evidently, hyperbolic claims that discoveries of new fossil species of catarrhines (perhaps hominids most especially) rewrite the evolutionary history of the group are unjustified, since new discoveries rarely have a major impact on our understanding of phylogeny, diversification and evolution. Alternatively, our perception of dinosaurian evolutionary history remains in flux as a result of discoveries of new species, especially from hitherto under-sampled geographic regions. However, while this confirms media claims that discoveries of new species have substantially altered our perception of the evolutionary history of dinosaurs, this is just as likely to be rejected in light of future discoveries.
Ultimately, our study indicates that the stability of taxonomic datasets should be assessed before embarking on macroevolutionary studies, or else researchers run the risk of conflating artefacts of incomplete taxonomic, stratigraphic, ecological or biogeographic sampling for evolutionary phenomena. This may provide fewer headlines and knowledge of evolutionary history that stands the test of time.
We are grateful to Martin Pickford, John Kelley, Stephen Frost, Terry Harrison and Tab Rasmussen for advice on the taxonomic positioning of taxa, to Kate Harcourt-Brown for help with the simulation model and to Graeme Lloyd for general comments. This manuscript benefited from the thoughtful comments provided by Pete Wagner and two additional reviewers.
- Received March 26, 2010.
- Accepted August 12, 2010.
- This Journal is © 2010 The Royal Society