The acquisition of hypsodont molars is often regarded as a key innovation in the history of ruminant ungulates. Hypsodont ruminants diversified rapidly during the later Neogene, circa 15–2 Myr ago, and came to dominate the ruminant fossil record in terms of species diversity. Here we show that hypsodont clades had higher speciation and diversification rates than other clades. Hypsodont species had, on average, shorter stratigraphic durations, smaller range size and lower occupancy than non-hypsodont species. Within hypsodont clades, some species were very common and acquired large geographical ranges, whereas others were quite rare and geographically limited. We argue that hypsodont clades diversified in an adaptive radiation-like fashion, with species often splitting cladogenetically while still in the expansive phase of their occupancy history.
Key innovations are often granted a pivotal role in evolution [1–3]. They are thought to spark bursts of diversification and enable entry into new adaptive zones and extension of geographical range [4–8]. Hypsodonty, or increased tooth crown height, has been recognized as such an innovation in the evolution of the mammalian dentition, enabling especially herbivores to process foods that are abrasive or otherwise conducive to increased dental wear [9,10]. Hypsodont ruminants rapidly diversified during the later Neogene, circa 15–2 Myr ago, in a context of climatic cooling, increasing aridity and expansion of open habitats, including grasslands [11–15]. Although geologically younger than other ruminants, hypsodont species make up the majority of the ruminant fauna today [13,15].
The diversification rate (the rate of increase in species number through time) depends on the difference between speciation and extinction rates. A given phenotypic trait could affect diversification through its effect on either factor. As a key innovation, hypsodonty might have promoted speciation via adaptive radiation. Alternatively, it might have lowered extinction rate if hypsodonts were better ecological competitors. Investigating the ecological circumstances under which hypsodont lineages thrived would then indicate the mechanism for their current dominance. For instance, hypsodont species are expected to be larger sized than non-hypsodont taxa. This is because the resource that hypsodonty makes available to herbivores is a physiologically demanding one. The symbiotic digestion of ‘fibrous’ foods that consist mainly of cellulose and other polysaccharides requires a slowed-down passage of the ingesta. Thus, grazing ruminants are more intake-limited than browsing ones, and compensate by increasing their body size in order to lower mass-specific energy requirements [16–18]. The relationship between hypsodonty and population size is more ambiguous. If hypsodonty did confer any significant competitive advantage over species with low-crowned molars, hypsodont species are expected to have had larger geographical range size and/or locality occupancy [4,11]. This would also agree with the finding that hypsodont species were shown to have larger niche breadths , meaning that they may subsist on graze, browse or any combination of both, whereas low-crowned species are restricted to browse. Alternatively, rapid diversification in hypsodont lineages might have promoted secondary dietary specialization, leading to smaller ranges [9,20].
Our aim here is to test whether higher speciation rate, lower extinction rate, or both were the cause of the rapid diversification of hypsodont lineages. For this, we built a 244 species-level tree of ruminants, including 213 extinct and 31 extant Eurasian species. We first estimated speciation and extinction rates in both hypsodont and non-hypsodont species, and their course through time. Then, we measured the evolutionary and ecological correlates of hypsodonty. We used material from the NOW database (see  and the electronic supplementary material) and applied both linear regression models and phylogenetic generalized least-squares (PGLS) regressions of hypsodonty classes (hypsodont, mesodont or brachydont) on species duration in the fossil record (in millions of years, Myr). We also retrieved from the NOW database information on species body size. Based on the occurrence data, we computed occupancy [11,22–25] and geographical range sizes for each species. For PGLS, we applied both the Brownian motion model of evolution, and Pagel's λ transform of the original branch lengths [26,27]. Pagel's λ provides the best fit of the Brownian motion to the data by means of a maximum likelihood approach . For geographical range size, we computed two different estimates. One is the minimum convex polygon (MCP) including all the fossil sites where a species occurs. The other (k local convex hull (k-LOCOH)) is based on the density distribution of the fossil sites [29,30], and is restricted to common species only, as it is not advisable to apply this method to taxa with less than 10 locality occurrences. In the present context, ‘common’ thus refers to taxa with 10 or more locality occurrences. Stratigraphic ranges were computed as the difference between the oldest and the youngest occurrence of the species in the fossil record, after localities with uncertain age were excluded by means of spectral ordering . Species commonness (locality occupancy) was estimated from the fossil record as the relative proportion of sites where a species is present divided by the number of sites included in the species' stratigraphic range.
Our results shed light on the nature and processes whereby key innovations give rise to evolutionary success.
2. Material and methods
(a) Rates of speciation and extinction
The extended discussion of our ruminant phylogeny, including a presentation of the tree itself in Newick format is available in the electronic supplementary material.
Different methods have been developed to compute character-associated diversification rates [7,32–34]. Among them, the binary-state speciation and extinction (BiSSE) method is the only one that computes extinction rate, and does not assume that ancestral character states reconstruction is independent from the character influence on speciation and extinction themselves .
BiSSE computes the probability of a phylogenetic tree and of the observed distribution of a binary character state (here hypsodont or non-hypsodont molars) among the tree tips, given a model of character evolution, speciation and extinction. BiSSE uses six parameters, speciation rate for states 0 (λ0) and 1 (λ1), the corresponding extinction rates μ0 and μ1 and the transition rates from one character state to the other, q01 and q10. Here state 0 corresponds to the non-hypsodont and 1 corresponds to the hypsodont conditions, respectively. The maximum-likelihood estimates of the parameters are computed. It is possible to fix parameters to be equal to each other by setting constraints (e.g. by setting λ0 = λ1) so that alternative models of evolution can be tested and then compared with each other by means of likelihood and Akaike information criterion (AIC) values assessment. When compared with other methods, BiSSE correctly computes transition rates from one character state to the other, thereby avoiding biased estimation of speciation and extinction rates based on incorrect transition frequency estimation .
We tested two alternative models. First, we fitted a ‘free’ model where all the six parameters are estimated. Then, we fitted a second model where λ0 = λ1 and μ0 = μ1. Since the diversification rate is λ − μ, our second model effectively assumes that diversification rates are character independent. In its initial formulation, the BiSSE method  assumed the phylogeny to be dichotomous and complete (i.e. all extant species of a group and their character states are included in the phylogeny). A later development extended the method to incomplete and unresolved phylogenies . This implementation was applied to our data by using the package in R. We first divided the fossil record into 10, 2 Myr long intervals. We then extracted the species present in each interval and then pruned the phylogeny to an interval phylogeny. The interval phylogenies were used to compute the six parameters in the free model, and the four parameters pertaining to the constrained model. The two were then compared with each other by means of ANOVA, for each interval. Our procedure computes speciation, extinction and transition rates per interval and per character state, so that the course of these parameters through time could be assessed.
(b) Species data
We performed our tests on Neogene Old-World ruminants of Eurasia. We used the NOW database (http://www.helsinki.fi/science/now/)  to retrieve information on species occurrence, hypsodonty and body size. Based on these, we computed occupancy (= relative commonness in the fossil record ), geographical range size and duration for each species.
We tested the relationships between hypsodonty and each of the variables both as they are, and in a phylogenetically explicit context by using PGLS regression. Body-size data in the NOW database is based on literature and on measured specimen-specific estimates based on equations in Damuth & Mcfadden . We used the data as it was recorded in the database.
Species duration (= age span) was calculated as the time interval intervening between a species first and last appearances in the fossil record. Fossil localities with unknown or imprecise dating may severely affect the estimation of species duration. To get rid of this problem, we first performed the spectral ordering of the NOW localities considered in this study, including both non-ruminant and ruminant species. Spectral ordering is a mathematical ordination (in time) of fossil localities species lists computed according to their faunal similarity . By means of spectral ordering, localities were given a score, which we regressed against the known age of that subset of localities having an absolute dating. We then removed those localities whose NOW age estimate falls outside the 95% confidence interval of the calculated age (see  for the full explanation of this procedure).
To estimate species occupancy, we first divided our longer than 18 Myr long fossil record into 19 time bins each of 1 Myr long. Then, for each species, we calculated the total number of occurrences, and divided it for the total number of localities present in the time bins where the species occurs. This procedure effectively removes sampling inequality [39,40], which is present in our data (the total number of localities per time bin varies between 5 and 221; the total number of species occurrences varies between 12 and 874).
Hypsodonty is usually calculated as an index of crown height  standardized to some measure of tooth's breadth at the base of the crown. Although meaningful, the hypsodonty index values are not available for a majority of species. In order to maximize the sample size, we used an ordinal index of hypsodonty, classifying brachydont species as 1, mesodont species as 2 and hypsodont species as 3. We used the classification recorded in the NOW database. The criteria for assigning species to these classes are ultimately up to the taxonomic coordinators of the NOW advisory board, but the rule of thumb is based on the ratio of height to length of the second molar (upper or lower, either). Brachydont teeth have a ratio of less than 0.8, mesodont teeth a ratio of 0.8–1.2 and hypsodont teeth a ratio greater than 1.2. A potential problem with this classification is that it is not an absolute scale of reference values, since the degree of hypsodonty is not directly comparable between clades. As such, it is exposed to potential misclassifications and is subjective to some extent, especially so for species with no living relatives. Nonetheless, ruminant molars are structurally very conservative, hence the scope for serious misclassification is probably very limited.
We calculated species range sizes by using two different procedures: the MCP and the recently developed fixed k-LOCOH method . The former method calculates the smallest convex polygon enclosing all the fossil localities where a species is present. We included all of the species with a number of occurrences greater than 2. The MCP of each species was computed by using Esri ArcGis. MCP may provide an underestimation of actual range sizes because of the incompleteness of the fossil record. On the other hand, some overestimation may be hidden in the data if (i) the species range moved over time because of dispersal, or (ii) the MCP includes non-inhabited areas.
The impact of these potential biases to our dataset is uncertain. Therefore, we used k-LOCOH  to obtain a different estimate of the extent of the territory occupied by a given species. k-LOCOH uses the density distribution of the localities where a species occurs. In order to compute k-LOCOH, different local convex hulls are drawn, each one associated with a single locality and its k − 1 nearest neighbours, where k is an appropriate number of neighbouring localities chosen by the ‘minimum spurious hole covering’ method, as suggested in Getz et al. . Then, all the hulls are collated from the smallest to the largest and new polygons (the isopleths) are built by the union of the computed local hulls until encompassing an a priori established percentage of the total number of occurrences. The isopleth including 95 per cent of the localities where a species occurs is considered here to represent the actual range size of the species.
k-LOCOH was computed via the R package  and the related script ‘NNCH’ . k-LOCOH gives the benefit of letting geographically outlying localities out of a species' range computation, thereby diminishing the risk of including areas occupied by range size movement or even by misclassification errors. A potential drawback of the method is that area estimates cannot be produced for rare species, given its computation requires a larger minimum number of localities (here nmin = 10) than the MCP. For both methods, non-terrestrial stretches of area were subtracted by the computed polygons.
We collected a total of 244 data points for hypsodonty, species duration and occupancy, 187 data points for MCP geographical range size, 121 data points for k-LOCOH geographical range size and 127 data points for body mass (electronic supplementary material, appendix S1).
(c) PGLS regressions
Species phenotypic traits are not independent data points in comparative studies. This happens to be the case because of the effect of shared ancestry on trait diversification [43,44]. A phylogenetically informed comparative analysis uses some measure of a priori-defined phenotypic trait covariation based on a phylogenetic tree to account for phylogenetic effects . In PGLS, the residual error in the dependent variable is assumed to be the variance–covariance matrix drawn from the species tree . Diagonal entries in this matrix are the distance in branch lengths (here in Myr) from each species to the tree root. Off-diagonal entries are the sum of branch lengths from the tree root to the most common ancestor for each pair of species. By assuming the most common evolutionary model (Brownian motion), the phenotypic similarity between each pair of species is expected to be proportional to their off-diagonal value (= their covariance). In PGLS regression, the variance–covariance matrix is used to adjust the regression parameters in order to maximize the fit to the Brownian motion model via generalized least squares. Since our independent variable is ordinal, dummy variables were used to fit both linear and PGLS regressions. This was done by using the packages and in R.
We first regressed hypsodonty versus (i) body mass, (ii) geographical range size (both estimates), (iii) occupancy, and (iv) species duration by applying linear models. Then, we replicated regressions in a phylogenetic context via PGLS, by assuming the Brownian motion model of evolution, which assumes a trait to evolve according to a constant-variance random walk. Trait evolution can proceed differently from the Brownian motion. Thus, we repeated each PGLS regression by using a transformed variance–covariance matrix by using Pagel's λ . Pagel's λ multiplies the off-diagonal elements of the variance–covariance matrix to a value λ in order to provide the best fit of the Brownian motion to the data by means of a maximum-likelihood approach . When λ is 0, no phylogenetic signal is present in the data. Conversely, λ = 1 implies Brownian motion. The upper limit of λ is defined by the tree height, because off-diagonal elements of the variance–covariance matrix cannot be larger than the variance .
A key innovation is inherently phylogenetic in nature, because the advantage it confers is highly correlated with phylogeny as far as the key trait is itself heritable. In such a situation, PGLS is conservative because it regresses out the phylogenetic component of the regression, which is the relationship between the key trait itself and the advantage it confers. With this caveat in mind, we also calculated the phylogenetic signal for each of the five variables considered by using the K statistic . K measures the tendency for related species to resemble each other more than expected by chance given the species included in the tree under the Brownian motion model of evolution.
3. Results and discussion
As expected from the known rise of hypsodont forms to dominance , the diversification rate in hypsodont taxa became higher than in non-hypsodont taxa during the later Neogene, the rate in hypsodont ones surpassing the non-hypsodont rate by about 10–12 Myr (table 1). The takeover was mainly dependent on the much higher speciation rate in hypsodont forms, whereas extinction rates are almost comparable between the two character states throughout our greater than 18 Myr long study period (figure 1). The two BiSSE models we tested perform equally well for the five older time bins, whereas the model with free parameters performs much better for the five younger bins, from 10 Ma onwards (table 1). According to the free parameters model, speciation rate associated with hypsodonty is much higher than with non-hypsodonty (i.e. λ1 > λ0) during the last 10 Myr. No corresponding difference in extinction rates can be seen, however (table 1). These results indicate that hypsodonty conferred significantly higher speciation rate, but not higher survival to high-crowned ruminant species.
The relationship between hypsodonty and species duration in the fossil record is negative and significant (p = 0.049; table 2). This still applies when phylogenetic effects are taken into account (pBrownian = 0.263, pPagel = 0.049 (notice that the Pagel transform works better basing on its much lower AIC score; table 3)). Considering extinct species only, duration is significantly different between hypsodonty classes (ANOVA, F2 = 7.486, p = 0.001). Hypsodont species have significantly shorter durations than non-hypsodont (browsers plus grazers) ones (Tukey HSD: hypsodont species duration = 1.05 Myr, p = 1) (mesodont = 1.57 Myr and brachydont = 1.82 Myr, grouped comparison p = 0.336). The negative relationship between hypsodonty and species duration is consistent with the ‘key innovation’ hypothesis, in so far as a key innovation is thought to promote diversification via increased speciation rate [5–7].
If hypsodonty conferred any ecological advantage, hypsodont species might be expected to have had larger range size and higher occupancy than low-crowned species. Instead, we found evidence that the relationships between hypsodonty and both of these variables are in fact negative. When MCP range size estimates are used, linear regression models indicate that hypsodonty is marginally associated with range size (p = 0.078). The slope of the regression is negative (−0.196). The same negative relationship applies with PGLS regression (pBrownian = 0.003, pPagel = 0.007; table 3). Similarly, occupancy is negatively associated with hypsodonty class (slope = −0.005, p = 0.010; pBrownian = 0.480, pPagel = 0.049; again Pagel's transform provides a lower AIC score). This would suggest that hypsodont taxa belong to clades whose species have relatively small geographical ranges (hence higher habitat specialization) . This interpretation agrees with Simpson's original concept of hypsodonty as a key innovation driving adaptive evolution and specialization .
However, when k-LOCOH range size estimates are used, the sign of the relationship between range size and hypsodonty class is reversed. This applies both to raw data (slope = 0.225; p = 0.072), and under PGLS regression (pBrownian < 0.001, pPagel = 0.018; table 3). This is not an artefact of how the ranges are calculated, because the correlation between the two range size estimates is strong and positive (r119 = 0.880, p << 0.001). Furthermore, when we repeated the PGLS regression on MCP data cut to the same species for which we had k-LOCOH range size estimates, the relationship between MCP and hypsodonty becomes significant and positive both under Brownian motion (slope = 0.256, p < 0.001, AIC = 354.09) and by using Pagel's λ transform (slope = 0.177, p < 0.044, AIC = 286.47). Instead, this applies because k-LOCOH ranges cannot be computed for rare species, whose range size is in fact negatively correlated to hypsodonty (figure 1). Indeed, average occupancy of the taxa with k-LOCOH range size estimates (0.0396) is about five times higher than in species without k-LOCOH estimates (0.0076). This points to a substantial dichotomy between rare and common species in their range size/hypsodonty relationship. In keeping with this notion, Jernvall & Fortelius  found that the most common taxa were almost entirely responsible for the increase in hypsodonty during the spread of open habitats some 11–5 Myr ago.
A possible caveat could be that if ranges shift over species existence, longer lived species may have spuriously larger ranges. Since hypsodont species are shorter lived, they may have spuriously shorter ranges. For this caveat to apply, then, range size and species duration should be correlated. Yet, we found this is not the case. For MCP range size estimates, the correlation is not significant (t185 = −1.069, p = 0.286, r = −0.078). With k-LOCOH estimates, the correlation is significant, but negative (t119 = −2.54, p = 0.012, r = −0.226). Thus, long-living species do not have significantly larger geographical ranges.
The regression of hypsodonty on body size gives a significant and positive slope (p = 0.001). Both the sign and significance of this relationship are confirmed under PGLS regression (pBrownian < 0.001, pPagel < 0.001) (tables 2 and 3). This result is expected from the ruminant digestive physiology .
Our results suggest that hypsodonty resulted in increased speciation rate in hypsodont clades. Yet, the evolutionary success accrued by hypsodonty was not ubiquitous within the clades: we estimated the phylogenetic signal by Blomberg et al.'s  K statistic (table 4 and electronic supplementary material, figure S2). Kappa is significant for hypsodonty, body size and duration but not for estimates of occupancy or geographical range size. This means that closely related species inherited body size, hypsodont molars and, to a lesser extent, a propensity for short duration in the fossil record from their ancestors, but neither commonness nor geographical range size. Since range size generally grows during the existence of a species [22–24,47,48], a possible explanation might be that hypsodont lineages have tended to originate daughter species before their full potential range size was occupied, forming clades containing mostly short-lived species of either very large or very small range sizes. This agrees with the notion that rapidly diversifying lineages should split frequently into peripheral isolates with small range size . It would also account for the negative relationship between hypsodonty and both geographical range size and occupancy, and for the opposite signs of the relationships between hypsodonty and range size when common species are compared with rare species (figure 2). A similar explanation was advanced to account for the great difference in species richness between closely related clades of specialist versus generalist species, for example, Vrba's effect hypothesis explaining the high species richness of alcelaphine bovids relative to the low richness of the impala clade .
In a related study, Feranec also found that hypsodonty works as a key innovation by promoting speciation . He contended that the geographical ranges of hypsodont taxa are no larger than those of non-hypsodont taxa, and argued instead that their niche breadths are larger. Feranec's methodology was different and his sample included horses, tapirs, camels and proboscideans besides ruminants, which may account for his different result. Despite the difference, both studies provide evidence that hypsodonty increased diversification rate, as originally proposed by Simpson . In terms of species richness, the net balance shifted in time significantly in favour of hypsodont taxa and most modern ungulate-rich ecosystems are accordingly dominated by hypsodont species.
The big picture that emerges from our study is that hypsodonty is strongly heritable, and that it spawned increased speciation rate. Species within hypsodont clades were large, and very different from each other in terms of commonness and geographical range size. Some were successful and attained large geographical ranges and occupancy, whereas others were geographically restricted. Overall, hypsodont clades clearly had a more dynamic evolutionary history than non-hypsodont clades. As expected for an evolutionary key trait, the acquisition of hypsodonty produced a great number of species, leading to the current dominance of hypsodont over non-hypsodont taxa in the fossil record.
Alan Gentry kindly helped us in correcting the taxonomy of bovids in the tree and gave us important insights as for the phylogenetic position of some clades. David Polly and one anonymous reviewer kindly helped us to make this paper more robust and our conclusions more sound. We are grateful to NOW contributors for their continuous support and work. This study grew out of a study visit by P.R. to Helsinki, supported by a grant (to M.F.) from the Academy of Finland.
- Received February 7, 2011.
- Accepted March 15, 2011.
- This Journal is © 2011 The Royal Society