The origin and timing of the diversification of modern birds remains controversial, primarily because phylogenetic relationships are incompletely resolved and uncertainty persists in molecular estimates of lineage ages. Here, we present a species tree for the major palaeognath lineages using 27 nuclear genes and 27 archaic retroposon insertions. We show that rheas are sister to the kiwis, emu and cassowaries, and confirm ratite paraphyly because tinamous are sister to moas. Divergence dating using 10 genes with broader taxon sampling, including emu, cassowary, ostrich, five kiwis, two rheas, three tinamous, three extinct moas and 15 neognath lineages, suggests that three vicariant events and possibly two dispersals are required to explain their historical biogeography. The age of crown group birds was estimated at 131 Ma (95% highest posterior density 122–138 Ma), similar to previous molecular estimates. Problems associated with gene tree discordance and incomplete lineage sorting in birds will require much larger gene sets to increase species tree accuracy and improve error in divergence times. The relatively rapid branching within neoaves pre-dates the extinction of dinosaurs, suggesting that the genesis of the radiation within this diverse clade of birds was not in response to the Cretaceous–Paleogene extinction event.
For over a century, the uncertainty surrounding the phylogenetic relationships among palaeognathous birds has made them one of the most controversial branches in the avian tree of life. Traditionally, this group has been subdivided into flighted tinamous of South and Central America and flightless ratites distributed across the southern continents. Ratites are a group of large flightless birds that lack a keel on their sternum, and include emu (Australia), cassowaries (Australia and New Guinea), rheas (South America), ostrich (Africa and formerly Asia), kiwis (New Zealand) and extinct moas (New Zealand) and elephant birds (Madagascar). Many studies using morphological and molecular data supported the monophyly of ratites and placed them sister to tinamous, although no consensus was achieved on relationships among ratites [1–5]. Their flightless condition and disjunct distribution on southern land masses led to a widely accepted biogeographic theory that their evolution and distribution was shaped by the break-up of Gondwana [1,3,5].
Recent analyses using multiple nuclear gene sequences of major clades of birds [6,7] have posited a seemingly radical phylogenetic hypothesis of ratite paraphyly because tinamous were found to be embedded within ratites. This view has also been suggested by a reanalysis of palaeognath mitochondrial genomes . Contrary to the widely accepted view that the common ancestor of ratites was flightless, this hypothesis implies that the common ancestor was probably a flying bird. Additionally, these studies concluded that the branching pattern of the palaeognath tree does not match the separation of the land masses as Gondwana fragmented . Instead, molecular dates estimated from mitochondrial genes suggest that many of the divergences post-date the break-up, leading to the prediction that the ancestors of different ratite lineages probably flew to different continents after Gondwana broke apart . Flightlessness may have evolved much later, thus challenging the conventional biogeographic theory that their disjunct distribution in the southern hemisphere was shaped by plate tectonics.
Tests of these hypotheses have broader reach beyond the avian tree of life because they have general relevance to the relative roles of vicariance and dispersal in the southern biota, and the timing and potential causes of the flowering of crown group lineages near the end of the Cretaceous. Such tests must also address a major scientific issue in the dating of crown group divergence times. Most molecular estimates of the ages of the various nodes in the avian tree of life have relied largely on a single locus, the mitochondrial genome [9–11] (but see [12,13]), which yields divergence times considerably older than a strict interpretation of the fossil record .
A fundamental prerequisite in testing these hypotheses is the construction of a well-resolved and strongly supported phylogeny of avian clades. The tree produced from a multigene nuclear dataset [6,7] has the ostrich sister to a clade composed of the remaining extant palaeognaths, but fails to resolve the branching pattern between the rheas, kiwis, Casuariidae and tinamous with confidence . A reanalysis of the mitochondrial genomes concluded that ‘beyond the sister-groupings of the tinamous and moa, and of emu and cassowaries, the relationships of the palaeognaths remain difficult to resolve’ (, p. 7). A striking feature of these tree topologies is the presence of successive short branches between major lineages deep within the tree, connected to long terminal branches. Rapid radiations causing this tree shape can pose problems in phylogenetic reconstruction owing to limits in the number of phylogenetically informative characters between closely spaced speciation events, which may be overwritten by subsequent mutations on long terminal branches. Rapid radiations can also lead to incomplete sorting of ancestral polymorphisms if population sizes were large [15,16]. In either case, this can produce gene tree discordance and failure to recover the underlying species tree with the common practice of concatenating gene sequences from multigene datasets [17,18]. Moreover, this will result in erroneous downstream conclusions about evolutionary processes when characters, life-history traits or geographical distributions are mapped onto incorrect tree topologies.
To further resolve palaeognath phylogenetic relationships and test different biogeographic hypotheses and the timing of key avian diversification events, we assembled two molecular datasets. The first consisted of 10 new nuclear gene sequences from a broad sampling of palaeognath species that were combined with 17 published nuclear genes to which we added new sequences from the extinct little bush moa (Anomalopteryx didiformis). To root the palaeognath tree and provide fossil calibration points for divergence dating of crown group birds, we also included 15 neognaths and two non-avian outgroup taxa. A second dataset of rare genomic changes was assembled to provide an independent test of palaeognath phylogenetic relationships. The CR1 non-LTR retrotransposon LINE family is ideally suited as a clade marker owing to its propagation properties and universality, as shown recently in several deeper-level avian phylogenetic studies [19–24]. Unlike DNA sequences, retroposon insertion analysis avoids problems of model misspecification, taxon sampling, long-branch attraction and base compositional biases. Both long-branch attraction and base compositional biases have been implicated in making phylogenetic analyses of the palaeognaths so problematic using mitochondrial genomes [3,8]. Although retroposon insertions avoid many of the pitfalls of DNA sequences, they are not exempt from incomplete lineage sorting. Conflict in rare genomic events can be useful as potential evidence of incomplete lineage sorting [22,23].
2. Material and methods
(a) DNA sequence datasets
(i) Ten nuclear genes with broad palaeognath taxonomic sampling
DNA sequences from 10 nuclear protein-coding genes were obtained from 16 species representing all major palaeognath lineages, including ostrich, five species of kiwi, emu, southern cassowary, both rhea species, three tinamou species and three species of extinct moa. Additionally, we sequenced these genes from 15 species of neognaths: blackish oystercatcher, painted snipe, wattled jacana, rifleman, zebra finch, common loon, Wilson's storm-petrel, saddle-billed stork, gentoo penguin, emperor penguin, chimney swift, ruby-throated hummingbird, chicken, greater white-fronted goose and magpie goose.
Sequences were also obtained for two non-avian outgroup taxa (alligator and caiman) to root the avian tree (further detail is available in the electronic supplementary material).
Total genomic DNA was isolated from all specimens and used to amplify and sequence BACH1, BDNF, BMP2, DNAH3, IRBP, NT3, PNN, PTPN12, RAG2 and TRAF6 using the polymerase chain reaction. Primers and amplification conditions are outlined in electronic supplementary material. All genes were translated and aligned by eye at the amino acid level, and nucleotides were made to conform to this alignment. The 10-gene dataset totalled 9909 base pairs (bp).
(ii) Twenty-seven nuclear genes with restricted palaeognath taxon sampling
To deal with possible gene tree heterogeneity we combined the 10 protein-coding genes with 17 of the 20 genes from Harshman et al.  and Hackett et al. . One gene was common to both datasets, and two of the 20 genes could not be obtained from the little bush moa, so they were excluded (see the electronic supplementary material for details). Taxon sampling in the 17-gene dataset differed at the species level in four genera, and at the genera level for three taxa, so we made composite sequences with closely related species (see the electronic supplementary material). The 27-gene dataset totalled 31 685 bp.
(b) Model selection and phylogenetic analysis
For the 10 protein-coding genes, the best-fitting models for four potential partitioning schemes (one partition, partitioning by gene or codon position and partitioning by gene and codon position) were calculated with the program MrModeltest v. 2.3  using the Akaike information criterion. Optimal partitioning was determined using Bayes factors with marginal likelihoods estimated by the stepping-stone (SS)  method implemented in PHYCAS v. 1.2 . Partitioning by codon position was favoured, and GTR + I + Γ was estimated to be the most appropriate model for first and second codon positions, and GTR + Γ for third codon positions. In the 27-gene dataset, the additional 17 genes were partitioned by gene because most were non-coding.
Phylogenetic analyses on concatenated sequences of the 10- and 27-gene datasets were conducted using Bayesian inference in MrBayes v. 3.1.2 . MrBayes was run on the CIPRES Science Gateway portal  multiple times with four chains set for 100 million generations and sampling every 2000 generations. All other run parameters were set as defaults, and convergence was examined using Tracer v. 1.4  with a 25 per cent burn-in. Individual gene trees were estimated using maximum likelihood using the program RAxML . Because gene tree discordance potentially can prevent recovery of the species tree with concatenated data , we used maximum pseudo-likelihood estimates of species trees (MP-EST)  to estimate the species tree from the set of gene trees. MP-EST has been shown to be statistically consistent in estimating the species tree even in the anomaly zone, and is computationally efficient for a moderate number of taxa, as in this study .
(c) Fossil calibrations and molecular dating
Because the 10-gene dataset had broader taxon sampling and very small amounts of missing sequence we used it to estimate divergence times among taxa. Node ages were estimated with the lognormal relaxed clock method in BEAST v. 1.6.2 , and were compared with relaxed clock estimates with correlated and uncorrelated rates in MCMCtree . To calibrate molecular clocks we used seven fossil anchor points: crocodile/bird split 243–251 Ma , alligator/caiman split 66–71 Ma , emu/cassowary split 25–35 Ma , magpie goose/greater white-fronted goose split 66–68 Ma , penguin/petrel split 60–62 Ma , hummingbird/swift split 47–49 Ma with an upper limit of 53 Ma  and jacana/painted snipe split 30–32 Ma . A uniform prior was placed on the origin of modern birds from 67 to 133 Ma for the relaxed clock methods. The lower limit was set by the oldest diagnostic neornithine fossil Vegavis . The upper limit was set by mitochondrial estimates of modern bird origins . Multiple independent runs of 100 million generations were performed in BEAST with sampling every 2000 generations. Dating in MCMCtree was performed with the available HKY model for 100 000 generations and sampling every 10 generations. Dates were averaged across the runs. Convergence of runs in both methods were checked using Tracer (ESS > 200).
(d) Retroposon identification from GenBank and library construction and screening
Genomic libraries were created for emu, ostrich, great spotted kiwi and lesser rhea, and were enriched for CR1 non-LTR retrotransposon insertions by adapting a primer walking protocol (www.biochem.uci.edu/steele/Splinkerette_page.html) and a CR1 consensus sequence (see the electronic supplementary material). Hundreds of CR1 insertion sites were recovered and identified using the RepeatMasker database . Primers were designed in flanking regions to screen for presence or absence of CR1 inserts in homologous loci of other palaeognaths. Emu sequences from 19 BAC clones downloaded from GenBank were also examined to identify additional CR1 inserts, which were screened against the other palaeognaths (see the electronic supplementary material). All CR1 inserts had flanking domains used in BLASTn searches (www.ensembl.org) against chicken and zebra finch genomes to determine whether inserts were also present in neognaths. CR1 insertions were also identified in the avian nuclear sequence dataset of Hackett et al. , kindly provided by Rebecca Kimball. The GenBank accession numbers for all the sequences collected here are JX532994–JX533444.
3. Results and discussion
(a) Gene sequence analysis
Bayesian analyses of the concatenated dataset of 10 nuclear protein-coding genes reaffirms the fundamental split in birds between the Palaeognathae and Neognathae, and the split between the Galloanserae and Neoaves within the Neognathae (figure 1). Within the Palaeognathae, there is strong support (posterior probability, PP = 1) for ostrich as sister to the remaining palaeognaths, in agreement with studies using multiple nuclear loci [6,7]. The tinamous are embedded within the ratites, rendering ratites paraphyletic, and are strongly supported as sister to the moas as in the reanalysis of mitochondrial genomes . The moa–tinamou clade is sister to rheas, kiwis, emu and cassowaries. In contrast, the nuclear gene trees of Harshman et al.  placed tinamous as either sister to emu, cassowaries and kiwis or as sister to the rheas, but both were poorly supported at these nodes. MtDNA genomes recovered the first of these topologies, but again with low nodal support .
In our tree, emu and cassowary (Casuariidae) are strongly supported as sister taxa (PP = 1), which in turn are sister to rheas (PP = 0.96). This topology differs from previous mitochondrial [3,5,8] and nuclear gene phylogenies [6,7], which strongly supported kiwis as sister to emu and cassowaries (figure 1). However, internal branches at this node are very short, increasing the potential for incomplete lineage sorting or polytomy in gene trees owing to lack of phylogenetic signal.
Individual consensus gene trees estimated in RAxML are discordant and have low bootstrap support at nodes with short internal branches (see electronic supplementary material, figure S1). However, all gene trees except BMP2 have strong bootstrap support (1.0) for the sister-group relationships of neognaths and palaeognaths and of Galloanserae and Neoaves (0.85), and place moa sister to tinamous (0.86). Five gene trees place ostrich sister to the other palaeognaths (0.93). There is only weak support from three genes for either rheas or kiwis as sister to emu and cassowary (0.53 and 0.47, respectively).
As statistical consistency in estimating the species tree from discordant gene trees is likely to increase with more genes , we analysed the 27-gene dataset with MrBayes. The tree is strongly supported (figure 2), and relationships among the palaeognaths are mostly congruent with the tree from the 10 genes. The exceptions in the 27-gene phylogeny are that kiwis are sister to Casuariidae instead of rheas (PP = 1.0), rheas are instead sister to this clade (PP = 1.0), and Chilean tinamou is sister to erect-crested tinamou rather than to great/white-throated tinamou. Among Neoaves, petrel is sister to penguins, rather than to stork, as recovered in the 10-gene dataset.
The bootstrapped gene trees were then used to estimate the species consensus tree using MP-EST. Because MP-EST needs rooted trees, we restricted the estimation of the species trees from the 10- and 27-gene datasets to palaeognaths rooted with chicken sequences, because alignments to crocodilian introns were problematic. The topology of the palaeognath species tree in MP-EST for the 10-gene dataset closely matched the tree from the concatenated dataset under MrBayes. The exception is the relationships among tinamous, which changed to match the topology of the 27-gene dataset concatenated tree. MP-EST for 27 genes is identical to that from the 27 genes under concatenation (figure 3).
However, bootstrap support for both MP-EST trees is much lower than the posterior probabilities found for their respective concatenated trees. These low bootstrap numbers probably reflect the level of uncertainty of clades within and among gene trees , and imply that the Bayesian concatenation method may have overestimated the posterior probabilities. Increasing the number of loci from 10 to 27 did increase the bootstrap support at the difficult nodes with very short internal branches, as expected from the results of simulations by Liu et al. . However, it is clear that many more genes will be required to resolve these nodes and estimate strongly supported species trees of the palaeognaths and of the Neoaves, where similarly short branches occur at internal nodes.
(b) Retroposon insertion analysis and tree topology
To independently test competing tree topologies inferred from sequences analysed here and the published nuclear [6,7] and mitochondrial gene sequences , we compiled a dataset of CR1 retroposon insertions from emu sequences available in GenBank and from four palaeognath genomic libraries. Among the hundreds of retroposon insertions screened, 22 were phylogenetically informative (figure 4; electronic supplementary material, table S2). Six are common to all the palaeognaths and not present in homologous loci in chicken or zebra finch genomes. By contrast, one insert was common to all neognaths but absent in the palaeognaths. These insertions support the fundamental split in the modern bird tree between the Palaeognathae and Neognathae.
Within the Palaeognathae, five insertion sites are common to tinamous and all extant ratites with the exception of ostrich. As retroposon insertions are self-polarizing, these five shared insertions identify all palaeognaths minus ostrich as a monophyletic group, reaffirming ratite paraphyly detected in sequence trees. Also in agreement with the 10- and 27-gene trees are two retroposon insertions exclusive to kiwis, rheas, emu and cassowary that support the monophyly of this clade. These two insertions cannot be mapped parsimoniously on the trees estimated from the published multigene nuclear datasets [6,7] and the reanalysis of the mitochondrial genomes . However, three insertions are estimated to be required at a node for statistical significance at p < 0.05 , but the two we found are consistent with the species tree estimated with MP-EST and with concatenation in MrBayes. The monophyly of the kiwis, emu and cassowaries is supported by two insertion sites they share, but which are absent in the rheas, ostrich, tinamous and moa (only one of the two loci amplified for the moa ancient DNA).
A single retroposon insertion unique to emu and cassowary supports the monophyly of this clade. Two insertions were unique to the kiwis (one common to all kiwis and the other only in the spotted kiwis). Three insertions were unique to rheas. All retroposon insertions were mapped on the 27 gene tree without incongruence. No conflicting insertions were found, and thus there is no evidence for incomplete lineage sorting in these retro-elements, but sample size for some nodes is small.
Within the Neognathae, five phylogenetically informative CR1 insertion sites identified from the nuclear sequence dataset of Hackett et al.  were mapped onto our sequence tree (figure 4). Two of these support jacana as sister to painted snipe (Scolopaci), two are common to all passerines and one was common to all penguins (Sphenisciformes).
(c) Molecular dating and palaeognath biogeography
Divergence times estimated across the avian tree are pivotal in understanding the roles of historical events such as plate tectonics and the large-scale mass extinction at the end of the Cretaceous in shaping the modern avian radiation. A likelihood ratio test performed in DAMBE  rejected a strict clock with p < 0.0001. Divergence times were therefore estimated with relaxed clock models in BEAST and MCMCtree, with the 10-gene dataset constrained to the 27-gene and retroposon-supported species tree topology. Estimates of node ages using uncorrelated and correlated rates among lineages in MCMCtree were very similar, so we only present the former to compare with those from BEAST (table 1). Mean node ages were similar using both methods, but among the paleognaths the 95 per cent highest posterior density (HPD) estimates were markedly broader in BEAST. For this reason, we focus on MCMCtree estimates, because the 95 per cent HPDs are conservative in comparing divergence times with the dates of fragmentation of Gondwana.
The age of crown group birds was estimated to be 131 Ma with 95 per cent HPD 122–138 Ma, consistent with earlier mitochondrial clock estimates [9–11]. This is almost twice the age of the oldest modern bird fossil Vegavis (66–68 Ma), but this fossil taxon is highly derived and placed in a modern order . Despite the use of widely accepted fossil calibrations, a gap persists between the age of the oldest crown group fossils and molecular estimates of the age of their most recent common ancestor (MRCA). However, Neornithes have been posited to have originated in the Southern Hemisphere , a region less sampled for fossils than the Northern Hemisphere, and perhaps where exposures with fossils of greater antiquity may yet be discovered to bridge the gap. Additionally, a recent assessment of the poor quality and patchy regional sampling of the fossil record in the Late Cretaceous suggested that the presence of neornithine lineages at this time cannot be refuted based on current evidence .
The MRCA of Palaeognathae was estimated at about 97 Ma (95% HPD 89–108 Ma) when the lineage leading to ostrich diverged. Earlier studies placed ostrich in a more derived position in the palaeognath tree, which translated to a molecular divergence time of 65 Ma. This was long after Africa separated from Gondwana around 100 Ma [47,48]. A mixture of dispersal and continental drift were invoked to explain the ostrich's presence in Africa and Asia [3,5]. By contrast, the much older ostrich divergence time we derived borders on the time of rifting of Africa from Gondwana, and thus dispersal need not be invoked.
The rheas were estimated to branch off from the lineage leading to kiwis, emu and cassowaries around 81 Ma (95% HPD 74–84 Ma). At this time, South America was connected to Australia via Antarctica, well before separation of Australia at approximately 55 Ma. The vicariance hypothesis predicts that the isolation of the rhea lineage in the Antarctic/South America land mass would therefore have occurred around the time Australia rifted away. Although this continental fragmentation time overlaps the lower 95 per cent HPD (52–101 Ma) for rhea divergence estimated in BEAST, a vicariant origin for this split seems unlikely given the much older mean divergence times estimated by both dating methods. Instead, a more ancient dispersal event is inferred in the Campanian.
The branching point between the kiwi lineage and Casuariidae was estimated at 78 Ma (95% HPD 71–83 Ma), close to the 80–82 Ma period when New Zealand began to separate from Gondwana , and thus vicariance cannot be rejected. The early divergence time of about 84 Ma (95% HPD 77–91 Ma) estimated for the lineage leading to moas and tinamous is intriguing because it too coincides with the rifting of New Zealand. If the MRCA of these lineages was on the New Zealand land mass at this time, only one dispersal to South America is required to explain the occurrence of tinamous in South America. Dispersal may have occurred when the two lineages diverged about 69 Ma (95% HPD 63–76 Ma). In this scenario, palaeognath biogeography requires possibly two dispersals, and three vicariance events.
Geological forces may have influenced diversifications of palaeognath lineages even after the fragmentation of Gondwana. The MRCA of the extant kiwis was estimated to occur around 5 Ma (95% HPD 3–8 Ma). The timing of the diversification of the kiwis into the five recognized species  coincides with an extensive mountain-building period in New Zealand and cycles of glaciation that probably facilitated allopatric speciation through habitat fragmentation. The three moa species sampled in this study included taxa from the two most divergent moa branches. This allowed estimation of the divergence time of their MRCA to about 19 Ma (95% HPD 12–27 Ma), consistent with previous dating by Baker et al. . This timing follows closely after the Oligocene ‘drowning’ of New Zealand, a period when most of New Zealand became inundated by the sea . The upper limit of our estimate coincides with the rebound and re-emergence of the New Zealand land mass around 23 Ma. This is much earlier than the 5.8 Ma proposed by Bunce et al.  based on mitochondrial sequences. The timing of the moa radiation is consistent with a terrestrial biota that has undergone a severe bottleneck owing to habitat loss followed by a subsequent diversification as the environment expanded.
The MRCA of Neognathae was estimated to have occurred around 107 Ma (95% HPD 97–118 Ma). Chicken and the two geese lineages were estimated to diverge around 83 Ma (95% HPD 74–94 Ma), similar to times obtained with mitochondrial sequences [9–11]. Within the Neoaves, the split of the clade of waterbirds (loon, stork, petrel, penguins) plus shorebirds (oystercatcher, jacana, snipe) from the Passeriformes (rifleman and zebra finch) plus Apodiformes (swift and hummingbird) was dated at around 81 Ma (95% HPD 75–89 Ma). These divergence times of major clades in the Neoaves predate the mass extinction at the end of the Cretaceous, and instead occur in the Campanian age (70–83 Ma), when the fossil record shows a coincident pulse of diversification within non-avian dinosaurs .
Conversely, in our tree, two nodes were significantly younger than had been estimated in earlier molecular studies. The MRCA of extant penguins was estimated at approximately 13 Ma (95% HPD 7–22 Ma), as opposed to the 40 Ma estimate derived from mostly mitochondrial genes and relaxed molecular clocks . The New Zealand endemic rifleman was calculated to have diverged from the zebra finch around 51 Ma (95% HPD 40–67 Ma), which makes the origin of the Passeriformes much younger than earlier estimates , and similar in age to the oldest passerine fossil found in Australia .
In summary, the use of nuclear multigene datasets to resolve the relationships among the palaeognaths [6,7] and improved methods of modelling base substitutions in mitochondrial genomes  have contributed to our understanding of the phylogenetic relationships among these birds. However, the persistence of poorly resolved nodes within these trees limits the conclusions that can be made about their evolution and biogeography. By improving upon taxon sampling, adding more gene sequences and introducing a large number of rare genomic changes, we have produced the most robust estimate to date of a multigene species tree of the palaeognaths that takes into account gene tree uncertainty. Nevertheless, support at some key branches in the species tree is still low, clearly indicating the need for a phylogenomics approach with a much larger gene set in future studies. The use of multiple fossil calibrations in molecular dating of the multilocus species tree produced estimates of divergence times that imply that both continental drift and dispersal are required to explain palaeognath biogeography. This is in contrast to dating of gene trees where divergence times can be overestimated, because there is no correction for genetic divergence prior to speciation . We conclude that the problems encountered in resolving phylogenetic relationships among the Palaeognathae and a limited sampling of the Neognathae are likely to be common across the sweep of modern bird lineages. Rapid radiations leading to incomplete lineage sorting and discordant or unresolved gene trees can be anticipated across short internal branches, which are a striking feature at the base of the bird tree. This will probably require coalescent methods combined with evidence from rare genomic events to help resolve the branching order of the major clades in the avian tree of life.
We thank the American Museum of Natural History, the Academy of Natural Sciences of Philadelphia, Louisiana State University and Carol Ritland for blood and tissue samples. This work was supported by grants to A.J.B. from the Natural Sciences and Engineering Research Council (Canada), the Royal Ontario Museum Foundation and Gloria Chen.
- Received July 14, 2012.
- Accepted August 22, 2012.
- This journal is © 2012 The Royal Society