Royal Society Publishing

Evolutionary rates of mitochondrial genomes correspond to diversification rates and to contemporary species richness in birds and reptiles

Soo Hyung Eo, J. Andrew DeWoody

Abstract

Rates of biological diversification should ultimately correspond to rates of genome evolution. Recent studies have compared diversification rates with phylogenetic branch lengths, but incomplete phylogenies hamper such analyses for many taxa. Herein, we use pairwise comparisons of confamilial sauropsid (bird and reptile) mitochondrial DNA (mtDNA) genome sequences to estimate substitution rates. These molecular evolutionary rates are considered in light of the age and species richness of each taxonomic family, using a random-walk speciation–extinction process to estimate rates of diversification. We find the molecular clock ticks at disparate rates in different families and at different genes. For example, evolutionary rates are relatively fast in snakes and lizards, intermediate in crocodilians and slow in turtles and birds. There was also rate variation across genes, where non-synonymous substitution rates were fastest at ATP8 and slowest at CO3. Family-by-gene interactions were significant, indicating that local clocks vary substantially among sauropsids. Most importantly, we find evidence that mitochondrial genome evolutionary rates are positively correlated with speciation rates and with contemporary species richness. Nuclear sequences are poorly represented among reptiles, but the correlation between rates of molecular evolution and species diversification also extends to 18 avian nuclear genes we tested. Thus, the nuclear data buttress our mtDNA findings.

1. Introduction

Although ecological or adaptive divergence can lead to diversification (Rieseberg & Wendel 2004), genome evolution at the molecular level is usually considered the initial starting point of speciation and thus is a major driving force underlying species diversification (Coyne & Orr 2004; Martin & McKay 2004; Eo et al. 2008). One of the most basic predictions in evolutionary biology is that the rate of diversification along a particular branch of the tree of life is some function of the rate of genome evolution on that branch. Increased rates of molecular evolution lead to increased genetic variation within a species, and this variation may ultimately be sundered to the point where new species are produced because of evolving reproductive barriers that ultimately reduce gene flow. If this process is iterative over evolutionary timescales, species diversity on a branch with fast rates of genome evolution should be greater than on a branch with slower rates of genome evolution, assuming all else is equal (including extinction rates). However, biologists still lack direct evidence to support the fundamental prediction that rates of genome evolution impinge upon diversification rates. A few recent studies have tested for correlations between species richness (or the number of speciation events) and the branch lengths on phylogenetic trees (Barraclough & Savolainen 2001; Webster et al. 2003; Pagel et al. 2006), but such analyses may be encumbered by incomplete phylogenies in some taxa and by difficulties associated with the quantification of diversification rates.

Relative rates of speciation and extinction can be derived based on explicit information about the birth and death of lineages through a particular time interval (Ricklefs 2007). This relative ratio of speciation to extinction is the basis for some recent methods of diversification rate estimation (Magallón & Sanderson 2001; Bokma 2003). These methods assume a random-walk process of speciation and extinction similar to a stochastic birth–death process in population biology (Baily 1964). In practice, these methods rely on the estimated age of a clade and the number of extant species it contains. For example, Ricklefs (2006) used nonlinear regression to estimate 0.53 speciation events, and 0.49 extinction events, per million years in passerine birds.

Not only do rates of biotic diversification differ dramatically across phylogenetic lineages, but so too do rates of molecular evolution (Bromham & Penny 2003; Kumar 2005). For example, rodents are evolving rapidly whereas apes (including humans) are evolving slowly compared with other mammalian lineages (Kumar 2005). The avian rate of molecular evolution is thought to be slow relative to most mammals (Mindell et al. 1996), and relative to other reptiles, snakes are evolving rapidly and turtles slowly (Avise et al. 1992; Hughes & Mouchiroud 2001). Most studies, however, have sampled only a few species and/or a small number of genes, and rates of molecular evolution can vary dramatically across genes as well as lineages. For example, globin pseudogenes evolve considerably faster than true genes (Li et al. 1981) and substitution rates are elevated in primate pituitary growth hormone genes compared with other mammals (Wallis 1994). Thus, in order to draw robust conclusions about the link between genome evolution and biotic diversification, diverse phylogenetic lineages and a large sample of genes should be evaluated to identify significant patterns (Kumar & Subramanian 2002). The analysis of the complete mitochondrial genome could be a valuable tool for this purpose, as it consists of 37 genes that are matrilineally inherited together. We do not mean to suggest that natural selection on mitochondrial DNA (mtDNA) substitution rates directly and strongly impacts biotic diversification. However, mtDNA substitution rates are elevated in some lineages where chromosomal rearrangements are rampant (Triant & DeWoody 2006), and both nuclear and mtDNA substitution rates are negatively correlated with mammalian body mass (Welch et al. 2008). Thus, it seems both reasonable and likely that mtDNA substitution rates in a lineage are positively correlated with nuclear DNA substitution rates in the same lineage. If so, mtDNA genome sequences could serve as a proxy for relative rates of nucleotide substitution in nuclear genomes where comparative sequence data are scarce.

Avian and reptilian mitochondrial genomes contain 13 protein-coding genes, 22 tRNAs, two rRNA subunits and the control region. A reasonable number of genes, conservative gene order, manageable genome size (16–18 kb), and heterogeneity in substitution rates make the mitochondrial genome an excellent model system for studying molecular evolution, genomic structure and function, phylogenetics and biodiversity (Wolstenholme 1992). Herein, we have analysed complete sauropsid mitochondrial genome sequences to: (i) characterize rates and patterns of molecular evolution across avian and reptilian lineages, (ii) partition this rate variation into gene, lineage and interaction effects, (iii) explore the association among the molecular evolutionary rates, diversification rates and contemporary species richness, and finally (iv) we validate these relationships using avian nuclear genes.

2. Material and methods

(a) mtDNA substitution rates

As of 30 March 2009, there were 194 complete mtDNA genome sequences of avian or reptilian species, with 136 (55/81 avian/reptilian) species represented by 33 (15/18 avian/reptilian) families with sequences available from at least two con-familial species. We assembled comparative sequence alignments for each of 28 families, excluding five families that consisted of only congeneric species, as our analyses focused on interfamily comparisons and the inclusion of congeneric species could underestimate family-specific evolutionary distances or substitution rates (electronic supplementary material, table S1).

Various mtDNA evolutionary distances were estimated using MEGA4 (Tamura et al. 2007). We first computed the mean number of nucleotide differences per site by pairwise comparison between two confamilial sequences, using the Tamura–Nei (TN) method to correct for multiple hits. For all 13 protein-coding genes, numbers of synonymous and non-synonymous substitution per site were calculated by comparing sequences codon-by-codon. We also estimated substitutions at fourfold degenerate sites as indicators of neutral evolution. To identify rate heterogeneity across families and across genes, we compared family-specific and gene-specific substitution rates. We estimated ages of reptilian families from the molecular (Hedges et al. 2006) and the palaeontological data (Olmo 2005) to minimize possible errors in divergence time estimates. However, we determined avian ages using molecular data only (Hedges et al. 2006) because the avian fossil record is limited and may underestimate clade ages (Pereira & Baker 2006). The gene-specific substitution rates were estimated as the mean substitution rates over all families, for each gene. We partitioned substitution rate variation into effects of family, gene and family-by-gene interaction and tested each effect using a random-effects analysis of the variance model. The variance components were estimated using the maximum-likelihood method.

(b) Estimation of diversification rates and species richness

The family-specific diversification rate can be estimated from the size and age of each family if we assume that the ratio of speciation rate to extinction rate is constant across families in a random-walk speciation–extinction process (Ricklefs 2006). Using a fixed proportion (κ = µ/λ) of extinction rate (µ) and speciation rate (λ), we computed the speciation rate of each family using the formula, λ = ln[N(1 − κ) + κ]/[(1 − κ)t], and diversification rate (δ = λ − µ), where N is the number of species and t is the age of a given family (Ricklefs 2006). We set κ = 0.99 after considering estimates based on the large datasets of Bokma (2003) and Ricklefs (2006), but we also used a range of κ (=0–0.95 in intervals of 0.05, as well as 0.995) because confidence limits on the estimate of κ can be broad owing to stochastic variation in the speciation–extinction process (Ricklefs 2007). Species richness, determined as the number of species in a given family, was taken from Dickinson (2003) for birds and from the Reptile Database (Uetz et al. 2009). Because there are some taxonomic inconsistencies among the various databases for genome sequences, family ages and species richness, we relied primarily on the species richness databases (Dickinson 2003; Uetz et al. 2009).

(c) Relationships among substitution rates, diversification rates and species richness

Although we assume no differences in the ratio of extinction/speciation rates across families, our estimates of net speciation rates (diversification rates) vary and reflect variation in the extent of lineage splitting among families (i.e. cladogenesis). We assessed whether the diversification rate is associated with the substitution rates across the mtDNA genome, which would indicate a link between rates of molecular evolution and speciation. To determine whether different fixed ratios of extinction/speciation rates would affect the resulting rates and relationships between diversification rates and substitution rates, we compared diversification rates of each family and coefficients of determination (r2) of the relationships for a range of κ (see above). We also compared species richness with the substitution rates. Correlations between the substitution rate and the diversification rate, and between substitution rate and species richness, were assessed using independent contrasts to accommodate shared ancestry (Harvey & Pagel 1991), as well as by using the raw data. To define the family-level independent contrasts, we used phylogenies in GenBank's Organelle Genome Resources and in Hackett et al. (2008).

(d) Validation using avian nuclear genes

To extend our mtDNA inferences, we collected avian sequences from 18 nuclear exons or introns that Ericson et al. (2006) or Hackett et al. (2008) used to study the diversification of modern birds (electronic supplementary material, tables S2 and S3). This nuclear dataset (22 kbp total) is phylogenetically well-represented (52 families from 229 species). We then repeated many of the tests described earlier.

3. Results

(a) Rates and patterns of sauropsid mtDNA genome evolution

We compiled a database composed of 125 complete mtDNA genome sequences (mean length of 17 171 ± 1 068 bp), representing 49 avian and 76 reptilian species.

Family-specific substitution rates were determined by pairwise comparisons of confamilial species (mean = 12 comparisons per family). Complete mtDNA genomes exhibited a mean TN rate of 3.35 × 10−9 substitutions per site per year. Snake and lizard families generally exhibited faster substitution rates (5.29 × 10−9), whereas turtle (2.01 × 10−9) and bird (2.56 × 10−9) genomes evolved more slowly. However, rate estimates varied considerably among families, from as rapid as 8.46 × 10−9 in the Amphisbaenidae to slower rates of 0.86 × 10−9 in the Rheidae. All molecular traits, including substitution rates of TN, at fourfold degenerate sites (fourfold), synonymous sites (Ks), and non-synonymous sites (Ka) averaged across all coding genes, showed generally concordant patterns: relatively fast substitution rates in snakes and lizards compared with slower rates in turtles and birds (figure 1).

Figure 1.

Family-level phylogenetic relationships and family-specific substitution rates of complete mtDNA genomes in birds and non-avian reptiles (turtles; squamates; crocodiles). Mean evolutionary rates of each family were estimated by dividing the divergence times of each family into (a) the Tamura–Nei substitution rate (TN); (b) the substitution rate at fourfold degenerate sites (fourfold); (c) synonymous substitution rates (Ks); and (d) non-synonymous (Ka) substitution rates. Error bars represent standard deviations from data of multiple species pairs.

The gene-specific non-synonymous substitution rates across families ranged from 0.70 × 10−9 at CO3 to 3.16 × 10−9 at ATP8 (figure 2). There was little variance in the synonymous substitution rates across genes (electronic supplementary material, figure S1 and appendix S2), and the mean rate was 10.39 × 10−9. For all measures of evolutionary rates, family and family-by-gene terms were statistically significant, with the family effect being the strongest while gene effect was weak (electronic supplementary material, appendix S2 and table S4).

Figure 2.

Mitochondrial genome non-synonymous (Ka) substitution rates for 13 protein-coding mtDNA genes in birds and reptiles.

(b) Relationships among substitution rates, diversification rates and species richness

We estimated the speciation rate (λ) and diversification rate (δ) of each family by assuming a uniform ratio (κ = µ/λ) of extinction rate (µ) to speciation rate across families (§2). When applying κ = 0.99 to our entire dataset of 28 avian and reptilian families, the estimated speciation rate per million years ranged from λ = 0.01 (δ = 0.0001) for the family Rheidae to λ = 7.32 (δ = 0.0731) for the family Colubridae. Across all families of birds and reptiles, mean λ = 1.40 ± 1.60 (mean λB = 1.04 ± 1.30 for birds; mean λR = 1.70 ± 1.80 for reptiles).

We conducted linear regression analyses to determine whether diversification rates are associated with molecular evolutionary rates (figure 3a). The log-transformed TN mtDNA genome substitution rates from each family, corrected for divergence times, were highly correlated with the estimated diversification rates (r = 0.76, p < 0.001). We used TN as a measure of the evolutionary rates in part because it facilitates comparisons within and among other large datasets (Kumar & Subramanian 2002). Furthermore, these positive TN correlations were consistent across three other measures of evolutionary rates (fourfold, r = 0.76, p < 0.001; Ks, r = 0.76, p < 0.001; Ka, r = 0.65, p < 0.001), and also when we separated our dataset into avian and reptilian families (birds, 0.73 < r < 0.89, p < 0.001 for all measures; reptiles, 0.66 < r < 0.70, p < 0.006 for all measures). We found constant coefficients of determination (r2) from each regression of diversification rates on the substitution rates regardless of κ. Further, these significant relationships were supported even when we used only molecular data or only fossil data to correct for reptilian divergence times (electronic supplementary material, table S5). We also directly tested for associations between molecular substitution rates and contemporary speciation richness. Again, all four measures of log-transformed mtDNA substitution rates were positively correlated with log-transformed species richness (TN, r = 0.66, p < 0.001; fourfold, r = 0.69, p < 0.001; Ks, r = 0.69, p < 0.001; Ka, r = 0.62, p < 0.001; figure 3b). The relationships were also significant with all four rate measures when we separately analysed birds and reptiles (birds, 0.69 < r < 0.76, p < 0.05; reptiles, 0.58 < r < 0.67, p < 0.05). Results of the phylogenetically independent contrasts were qualitatively similar to the results from the analysis of raw species data (electronic supplementary material, table S6). Collectively, these findings lend support to our original prediction that increased rates of molecular evolution lead to a more rapid accumulation of species diversity.

Figure 3.

Correlation between (a) diversification rates (δ) and (b) species richness and mtDNA non-synonymous (Ka) substitution rates in sauropsids (birds and reptiles) when we assume the ratio of extinction to speciation rate is 0.99. Solid (birds: δ, r = 0.73, p < 0.01; species richness, r = 0.76, p < 0.01) and dotted (reptiles: δ, r = 0.66, p < 0.01; species richness, r = 0.58, p < 0.05) lines are trend lines. Open circles, birds; filled circles, reptiles.

The positive relationships between substitution rates, diversification rates and contemporary species richness were also apparent in our survey of avian nuclear genes. Substitution rates (TN) for all 18 nuclear genes that we tested were significantly correlated with avian diversification rates (0.48 < r < 0.86, p < 0.01 for all tests; electronic supplementary material, table S7). TN substitution rates were significantly associated with species richness at 12 of 18 genes (66%). Similar results were obtained for Ks, Ka and fourfold measures of substitution rate (electronic supplementary material, table S7).

4. Discussion

Our results are consistent with published data in that they support the general idea of lineage-specific molecular clocks among birds and reptiles. Previous studies have also suggested that evolutionary rates vary among lineages and genes (Martin & Palumbi 1993; Hughes & Mouchiroud 2001; Kumar & Subramanian 2002). Here we show that as gauged using entire mtDNA genome sequences, snakes and lizards are generally characterized by rapid substitution rates, whereas rates are generally slower in turtles and birds.

In principle, the pattern of ‘fast’ snakes and lizards compared with ‘slow’ turtles and birds could be an idiosyncratic artefact of the species or genes we analysed, but we accounted for this to the extent possible by analysing every protein-coding gene in the mtDNA genome. There is considerable rate heterogeneity in the mtDNA dataset, as exemplified by about a 10-fold increase in the TN substitution rate among Amphisbaenidae, Sylviidae or Colubridae as compared with Rheidae (figure 1). It remains to be seen whether this rate heterogeneity extends to sauropsid nuclear genes, as recent work in mammals indicates that rate differences within the rodents and the primates are similar in magnitude to those between groups (Kumar & Subramanian 2002). Unfortunately, there are still too few sequences available for a robust comparison of rate heterogeneity among sauropsid mtDNA and nuclear genes.

Our most important findings are the positive correlations between mtDNA substitution rates and diversification rates, and from mtDNA substitution rates to contemporary species richness. These correlations suggest there is a direct link between evolutionary rates at the molecular level and biological diversification via speciation and extinction. Barraclough & Savolainen (2001) first revealed the correlation between substitution rates and species richness in flowering plants using relative estimates of substitution rate variation among sister-family pairs, whereas Webster et al. (2003) investigated this relationship using the number of nodes in a phylogenetic tree as a proxy for the speciation rate. In contrast, our approach was to consider absolute mtDNA genomic rate variation within families. We extend earlier studies by providing integrated evidence to support the idea that rapid substitution rates increase speciation rates, which then result in a net increase in contemporary species richness. We associated these correlations by estimating the speciation (and diversification) rates of each family and found they are linked.

As an example, the diversification rate of the avian family Rheidae is 0.01 if we assume that 1 per cent of newly formed species survive (i.e. 99% go extinct). The Rheidae has only two extant species on Earth and their substitution rate was one of the slowest among the families we analysed, whereas the Colubridae has more than 1000 extant species and their substitution rate was one of the fastest analysed. This means that 0.01 diversification events occur every million years in Rheidae compared with 7.31 new species per million years in the Colubridae (a 700-fold difference). Note the value of κ we used was estimated not from our own data but from other large datasets (Bokma 2003; Ricklefs 2006) in order to reduce bias. Of course, these absolute speciation rates should still be interpreted with caution because the assumption of a fixed κ results in a broad confidence interval (Magallón & Sanderson 2001; Bokma 2003; Ricklefs 2006). For instance, the estimated diversification rate in the Colubridae varies dramatically (0.18, 0.33 and 11.50 speciation events per million years) when we applied κ = 0, 0.5 and 0.995, respectively. Furthermore, we assumed homogeneity for the sake of simplicity although κ is probably heterogeneous among lineages in nature. Finally, the extinction rate is a factor that may be independent of the inherent speciation rate as extinction risks are affected by intrinsic as well as extrinsic factors (see references in Fisher & Owens 2004). Despite these caveats, the coefficients of determination (r2) in the association between substitution rates and diversification rates (or species richness) do not vary with respect to κ, suggesting the associations are strong regardless of the relative ratio of extinction and speciation.

Fossil records are very important in dating divergence times and therefore in estimating molecular evolutionary rates of each family. However, limitations of the fossil record (especially with regard to birds) make it necessary to consider alternative plausible divergence times (Pereira & Baker 2006). Thus, we repeated our analyses on different timescales with molecular, fossil and their mixed data to reduce errors in divergence times associated with a single data source. Despite the various evolutionary ages used in our analyses, we found significant relationships among mtDNA evolutionary rates, diversification rates and species richness in more than 90 per cent of the tests. This suggests that the associations we report are not artefacts of various divergence time estimates.

Tests for relationships among molecular evolutionary rates, diversification rates and species richness should be much more powerful if based on multiple genes. The mitochondrial genome consists of 37 genes and their substitution rates (in particular Ka) vary as shown in electronic supplementary material, table S4. However, the mtDNA molecule is inherited as a single haplotype because of the lack of recombination. Thus, we also evaluated the relationships among substitution rates, diversification rates and contemporary species richness using nuclear genes (all from birds, as there is a paucity of reptile nuclear data). The avian nuclear data generally mirror the mtDNA in that 100 per cent of the 18 genes analysed revealed significant positive relationships between substitution rates (TN) and diversification rates. When we consider Ks, Ka and fourfold substitution rates for avian nuclear genes, 93 per cent of the statistical tests indicate significant positive associations with diversification rate (most with p < 0.001). Associations between substitution rates and contemporary species richness also mirrored the mtDNA data (electronic supplementary material, table S7). We maintain it is no accident that rates of molecular evolution correspond to rates of sauropsid diversification and to contemporary species richness.

Faster substitution rates may accelerate genetic divergence and thus may promote incipient speciation (Bromham 2003; Martin & McKay 2004; Eo et al. 2008). Speciation rates, the diversification process and species diversity are influenced by two important determinants of rate heterogeneity: natural selection and genetic drift (Coyne & Orr 2004; Xiang et al. 2008). Natural selection should more effectively influence the fate of non-synonymous substitutions, whereas genetic drift typically prevails with regard to synonymous substitutions (although linkage disequilibrium is a confounding factor). Our dataset indicates that both classes of substitutions are associated with rates of sauropsid lineage diversification and also with contemporary species richness. Thus, the relative importance of selection, drift and linkage disequilibrium on biological diversification remains to be determined (Coyne & Orr 2004; Xiang et al. 2008).

Acknowledgements

We thank the DeWoody laboratory group for their comments on the manuscript. We are also grateful to R. E. Ricklefs and R. G. Harrison for helpful comments, to the National Science Foundation and to the Office of the Provost at Purdue for funding through the University Faculty Scholar programme.

  • Received May 6, 2010.
  • Accepted June 14, 2010.

References

View Abstract