Assessing the root of bilaterian animals with scalable phylogenomic methods

Andreas Hejnol, Matthias Obst, Alexandros Stamatakis, Michael Ott, Greg W. Rouse, Gregory D. Edgecombe, Pedro Martinez, Jaume Baguñà, Xavier Bailly, Ulf Jondelius, Matthias Wiens, Werner E. G. Müller, Elaine Seaver, Ward C. Wheeler, Mark Q. Martindale, Gonzalo Giribet, Casey W. Dunn

Abstract

A clear picture of animal relationships is a prerequisite to understand how the morphological and ecological diversity of animals evolved over time. Among others, the placement of the acoelomorph flatworms, Acoela and Nemertodermatida, has fundamental implications for the origin and evolution of various animal organ systems. Their position, however, has been inconsistent in phylogenetic studies using one or several genes. Furthermore, Acoela has been among the least stable taxa in recent animal phylogenomic analyses, which simultaneously examine many genes from many species, while Nemertodermatida has not been sampled in any phylogenomic study. New sequence data are presented here from organisms targeted for their instability or lack of representation in prior analyses, and are analysed in combination with other publicly available data. We also designed new automated explicit methods for identifying and selecting common genes across different species, and developed highly optimized supercomputing tools to reconstruct relationships from gene sequences. The results of the work corroborate several recently established findings about animal relationships and provide new support for the placement of other groups. These new data and methods strongly uphold previous suggestions that Acoelomorpha is sister clade to all other bilaterian animals, find diminishing evidence for the placement of the enigmatic Xenoturbella within Deuterostomia, and place Cycliophora with Entoprocta and Ectoprocta. The work highlights the implications that these arrangements have for metazoan evolution and permits a clearer picture of ancestral morphologies and life histories in the deep past.

1. Introduction

(a) Scalability in phylogenomic analyses

As the cost of sequencing DNA has fallen, broad-scale phylogenetic studies have begun to shift away from pre-selected gene fragments isolated by directed PCR—the traditional target gene approach—to high-throughput sequencing strategies that collect data from many genes at random. These sequencing methods, which include expressed sequence tags (ESTs) and whole-genome shotgun sequencing, theoretically allow gene selection to be part of the data analysis rather than project design since gene selection does not affect, and could be informed by, data acquisition. Existing phylogenetic studies already vary in size by at least four orders of magnitude and are anticipated to grow much larger, so scalable gene selection methods (i.e. tools that are able to accommodate datasets of very different sizes) based on explicit criteria will become increasingly important. In addition to facilitating larger analyses, such tools would make it possible to evaluate the specific effects of gene selection criteria on phylogenetic results. The development of methods and criteria for matrix assembly, rather than the manual curation of gene lists, would also facilitate the construction of matrices for a wide diversity of phylogenetic problems, including matrices optimized for subclades, superclades or entirely different groups of organisms.

The last several years have seen a proliferation of tools for identifying homologous sequences and evaluating orthology (Chen et al. 2007), but fully automated phylogenomic matrix construction based on explicit criteria is still in its infancy. A recent study that included new EST data for 29 broadly sampled animals applied a largely automated method for gene selection (Dunn et al. 2008) that relied on phenetic Markov clustering (MCL; van Dongen 2000) followed by phylogenetic evaluation of clusters. User intervention was required to evaluate some cases of paralogy. That study supported previous views that broad taxon sampling is critical for improving the phylogenetic resolution of the metazoan tree of life. Some important relationships still remained unresolved, however, and other critical taxa were unsampled.

(b) The base of Bilateria

The existence and placement of Acoelomorpha, a group hypothesized to consist of Acoela and Nemertodermatida, have been particularly problematic. Resolving the placement of acoelomorphs is essential for rooting the bilaterian tree and understanding the early phylogeny of bilaterian animals, particularly for the reconstruction of the evolution of animal organ systems (Baguñà & Riutort 2004; Hejnol & Martindale 2008b; Bourlat & Hejnol 2009). This is therefore one of the most important remaining problems in animal phylogenetics. Acoela has been recovered as the sister group to all other bilaterian animals in direct sequencing analyses, though their placement with respect to Nemertodermatida has been inconsistent (Ruiz-Trillo et al. 1999, 2002; Jondelius et al. 2002; Wallberg et al. 2007; Paps et al. 2009). The position of acoels has not been resolved satisfactorily in previous EST-based analyses (Philippe et al. 2007; Dunn et al. 2008; Egger et al. 2009). In fact, they were the most unstable taxon in the Dunn et al. (2008) study. In a more recent phylogenomic study, Egger et al. (2009) found an acoel to be the sister group to the rest of Bilateria, but questioned the result based on data on stem cell distribution and proliferation, as well as the mode of epidermal replacement, and suggested that acoels could alternatively be part of Platyhelminthes. Critically, the second major acoelomorph group, Nemertodermatida, has yet to be included in any phylogenomic analysis.

Here we simultaneously address new analytical challenges of building phylogenomic matrices using entirely explicit criteria, investigate central questions in animal phylogenetics regarding Acoelomorpha and several other important taxa, and explore the effects of subsampling this matrix. We do this by collecting new data from relevant animals, developing new orthology evaluation methods that enable the construction of much larger data matrices and applying optimized tools for high-performance computing architectures. The new data we generated (see electronic supplementary material, table S1) focus on the putative group Acoelomorpha, including two species of the previously unsampled Nemertodermatida. We also added new EST or whole-genome data for additional taxa of special interest. Publicly available data were incorporated, largely derived from the same taxa considered in a previous analysis (Dunn et al. 2008), but also including additional key taxa such as the placozoan Trichoplax adhaerens and the gastropod mollusc Lottia gigantea. Our new gene selection strategy relies exclusively on explicit criteria, allowing it to be fully automated, and it is scalable across projects of very different sizes. This new method improves the ability to build large matrices, though at a trade-off of shallower gene extraction from poorly sampled EST libraries.

2. Material and methods

(a) Data acquisition and matrix assembly

New data were generated for seven taxa (electronic supplementary material, table S1) that were selected to address several key questions in animal phylogeny, and a total of 94 taxa were included in the present analyses (electronic supplementary material, table S2). Sequencing and assembly of ESTs were performed as previously described (Dunn et al. 2008). The new ESTs were strategically collected from species in groups that were unstable (according to leaf stability metrics; see below and Dunn et al. 2008), undersampled or unrepresented in previous studies. These include a sponge, two acoels, two nemertodermatids, an entoproct and a cycliophoran. All new data, not just the sequences used for phylogenetic inference, have been deposited in the National Center for Biotechnology Information (NCBI) Trace Archive. Publicly available data for a variety of other taxa were incorporated into the analysis (see electronic supplementary material, table S2).

(b) Homology assignment and paralogue pruning

Phenetic sequence clustering was similar to that of Dunn et al. (2008), though taxon sampling criteria were relaxed considerably as described below. Unless specified otherwise, all software versions and settings are the same as in that study. Amino acid sequences were used at all stages of analysis. Sequence similarity was assessed with the previously described BLAST strategy (Dunn et al. 2008) and then grouped with MCL (van Dongen 2000). An MCL inflation parameter of 2.2 was used (see electronic supplementary material). Clusters were required to (i) include at least four taxa, (ii) include at least one taxon from which data were collected in this or the previous study, (iii) include at least one of the taxa used as BLAST subjects, (iv) have a mean of less than five sequences per taxon, (v) have a median of less than two sequences per taxon and (vi) have no representatives of a HomoloGene group that had sequences in more than one MCL cluster. Clusters that failed any of these criteria were not considered further. Sequences for each cluster that passed these criteria were aligned with Muscle (Edgar 2004), trimmed with GBlocks (Castresana 2000) and a maximum likelihood (ML) tree was inferred by RAxML.

The assessment of cluster phylogenies herein differs markedly from Dunn et al. (2008). In the first step, monophyly masking, all but one sequence were deleted in clades of sequences derived from the same taxon. The retained sequence was chosen at random. Paralogue pruning, the next step, consisted of identifying the maximally inclusive subtree that has no more than one sequence per taxon. This tree is then pruned away for further analysis, and the remaining tree is used as a substrate for another round of pruning. The process is repeated until the remaining tree has no more than one sequence per taxon. If there were multiple maximally inclusive subtrees of the same size in a given round, then they were all pruned away at the same time.

Subtrees produced by paralogue pruning were then filtered to include only those with (i) four or more taxa and (ii) 80 per cent of the taxa present in the original cluster from which they were derived (see electronic supplementary material). Fasta-format files with sequences corresponding to each terminal in the final subtrees were then generated, aligned and concatenated into a supermatrix.

(c) Phylogenetic inference

Phylogenetic analyses were conducted on an IBM BlueGene/L system at the San Diego Supercomputer Center, comprising three racks of 1024 nodes each, with two processors per node. The total analysis time was 2.25 million processor hours. The relatively low amount of per node RAM on the IBM BlueGene/L (BG/L) means that the likelihood computations for a single tree topology need to be conducted concurrently on several nodes, essentially by distributing the alignment columns across processors. The dedicated parallel version of RAxML for the current analysis is based on RAxML v. 7.0.4. A significant software engineering effort was undertaken to transform the initial proof-of-concept parallelization on an IBM BG/L into production-level code that covers the full functionality of RAxML. Among other things, the performance of the code was improved by 30 per cent (compared with the original BG/L version) via optimization of the compute-intensive loops in the phylogenetic likelihood kernel. In general, the fine-grained parallelization strategy deployed here at the level of the phylogenetic likelihood kernel needs to be applied on all state-of-the-art supercomputer architectures to better accommodate the immense memory requirements of current phylogenomic studies (Stamatakis & Ott 2008). The ability to now split the likelihood calculation for a single matrix across multiple nodes, rather than just dividing bootstrap replicates across nodes, overcomes hurdles from memory limitations per node that are encountered with large alignments, allows for a short response time for a single tree search and enables the convenient exploitation of thousands of CPUs. The adaptation of RAxML to the IBM BG/L also required the development of solutions to avoid memory fragmentation.

Models of molecular evolution were evaluated using the Perl script available from the RAxML website. ML searches and bootstrap analyses were executed under the Gamma model of rate heterogeneity. Tree sets were summarized with Phyutility (Smith & Dunn 2008), which was used to map bootstrap support onto the most likely trees, calculate leaf stability and prune taxa.

3. Results

(a) Data matrix assembly

MCL generated 7445 clusters, of which 2455 passed the taxon sampling and other phenetic criteria described above. Paralogue pruning, the phylogenetic evaluation and pruning of these clusters to generate sets of orthologues with no more than one sequence per taxon resulted in 4732 subtrees with four or more taxa (the minimum size of a phylogenetically informative tree), of which 1487 passed the additional criteria described in the methods. This process is robust to noisy data, even when two haplotypes are included for nearly every gene in the Branchiostoma floridae genome (see electronic supplementary material on the robustness of matrix assembly). The final 1487-gene, 94-taxon matrix (figure 1) was 270 580 amino acids long, and had 19 per cent occupancy (i.e. on average 19% of the genes were sampled for each taxon) and 251 152 distinct column patterns. Of the 150 genes from the previous study (Dunn et al. 2008), 56 corresponded to genes in the new 1487-gene matrix. The omission here of genes considered in that previous analysis, or other such studies, does not necessarily indicate that they were unfit for phylogenetic inference, only that they were not accepted according to the different set of criteria used here that are optimized for other purposes.

Figure 1.

Genes are ranked by decreasing taxon sampling. (a) The number of taxa sampled for each gene is shown along the left vertical axis and indicated by blue data points, while the cumulative matrix completeness is shown on the right vertical axis indicated by a green continuous line. Vertical lines indicate the gene cutoffs for the four matrices that were analysed. (b) ‘Bird's eye’ view of the matrix. A white cell indicates a sampled gene. Taxa are sorted from the best sampled at the top to the least sampled at bottom (gene ordering is the same as in (a)).

Relative to the previous study (Dunn et al. 2008), the number of gene sequences in the new matrix was greatly increased for taxa with many sequenced genes (i.e. the number of unique protein predictions following EST assembly and translation), but was reduced for taxa with the smallest numbers of sequenced genes (electronic supplementary material, table S2), despite there being nearly ten times as many genes in the total matrix (1487 versus 150). The reasons for this are explored in greater detail in the electronic supplementary material, along with comparisons to the 150-gene matrix supplemented to include all 94 taxa considered here (electronic supplementary material, fig. S1). The best-sampled taxon, Homo sapiens, had 1351 (90.9%) of the 1487 genes, whereas the most poorly sampled taxon, Phoronis vancouverensis, had only 2 (0.14%; yellow circles in figure 2). Positions of taxa with the least data were not well resolved. The new matrix construction strategy was therefore disproportionally beneficial for well-sampled taxa. Poorly sampled taxa such as P. vancouverensis were not excluded from analyses a priori because heterogeneous sampling success is common in EST datasets, and is therefore of analytical interest. Also, the later application of leaf stability indices allows for the evaluation of support between stable taxa, even when poorly sampled, unstable taxa are included in the analysis.

Our analyses address the potential impact of missing data in several ways (see electronic supplementary material; §4). We found no indication that missing data have resulted in systematic error, though the analyses we were able to conduct were necessarily constrained by the size of the large matrix and the subject in general still requires greater attention.

Figure 2.

Phylogram of the most likely tree found in ML searches of the 1487-gene matrix (37 searches, log likelihood = −6 124 157.6). The area of the yellow circle at each tip is proportional to the number of genes present in the 1487-gene matrix for the indicated species (see table S2 in the electronic supplementary material for values). Bootstrap support from analyses of the 844-gene (black values above nodes, 201 bootstrap replicates) and 330-gene (red values below nodes, 210 bootstrap replicates) subsamples of the 1487-gene matrix are also shown at each node. Asterisk indicates 100 per cent bootstrap support. Species for which new EST data are produced are highlighted with green species names.

(b) Gene subsampling comparisons: large, sparse matrices versus smaller, denser matrices

We analysed the complete 1487-gene matrix with 19 per cent gene occupancy, and three nested subsamples with 25, 33 and 50 per cent occupancy (figure 1). These subsets were constructed from the best-sampled genes and had 844, 330 and 53 genes, respectively. The RTREV model, with empirically estimated amino acid frequencies (F option; for details, see RAxML manual) was selected for all four matrices and used in all analyses. Partitioned analyses that apply a different model to each gene were not possible owing to load balancing problems in the likelihood kernel that resulted in severely decreased computational efficiency. The load balance problem is due to a strong variation in per-partition pattern numbers.

The optimal trees across analyses (figures 24) are in broad agreement with most recent phylogenomic and targeted-gene analyses in depicting, for example, monophyly of Protostomia and Deuterostomia as the fundamental bilaterian clades, and the division of protostomes into Ecdysozoa and Spiralia (the latter sometimes referred to as Lophotrochozoa; but see Giribet et al. 2009). The analyses consistently resolve Spiralia into two major clades: Trochozoa, which unites Mollusca and Annelida with a nemertean–brachiopod group recently named Kryptrochozoa (Giribet et al. 2009); and a grouping of Platyzoa together with an ectoproct–entoproct–cycliophoran clade that we discuss below under the name Polyzoa, introduced by Cavalier-Smith (1998). A more contentious issue is the base of the metazoan tree, and, after the addition of new ctenophore and sponge ESTs (compared with Dunn et al. 2008), and the complete genome of T. adhaerens, our most inclusive datasets support ctenophores as sister to all other metazoans. The positions of sponges and T. adhaerens relative to each other varied across matrix subsamples as described in the electronic supplementary material.

Analyses of the 53-gene subset were largely unresolved, with little convergence even between ML replicates (not shown) and poor bootstrap support at almost all deep nodes (electronic supplementary material, fig. S2). Differences between ML analyses of the 1487-, 844- and 330-gene matrices were restricted to the placement of a small number of taxa (see electronic supplementary material for details). Analyses of the 330-gene matrix recovered most of the relationships found from the 844-gene and 1487-gene matrices, many of which were not recovered in the 150-gene matrix from a previous study (electronic supplementary material, fig. S1) or the 53-gene matrix (electronic supplementary material, fig. S2). Bootstrap support values for many relationships were similar in the 330-gene and 844-gene analyses (figures 24; electronic supplementary material, fig. S3). Bootstrap support for the 1487-gene matrix was not evaluated owing to computational limitations.

Figure 3.

Cladogram showing bootstrap support for relationships between taxa from figure 2 with a leaf stability of 87 per cent or higher. This criterion was met by 87 taxa, though only bilaterian taxa are shown (other relationships were not impacted by the removed taxa). The 844-gene (black values above nodes) and 330-gene (red values below nodes) subsamples are also shown at each node. Asterisk indicates 100 per cent bootstrap support.

(c) Taxon subsampling: stability and the visualization of phylogenetic relationships

Different taxa within the same phylogenetic analysis can have widely disparate stability (Thorley & Wilkinson 1999). In the present analyses most taxa are quite stable (leaf stability; electronic supplementary material, table S2)—their relationships with each other are consistent and well supported across bootstrap replicates. Other taxa, however, have inconsistent relationships across and within analyses. These unstable taxa tend to be poorly sampled in the matrix generated here, as for Phoronis and some molluscs.

A small number of unstable taxa can obscure strongly supported relationships between stable taxa, even if they have no effect on those relationships. Unless visualization tools are used that can identify stable relationships that are not affected by unstable taxa and assess support for these relationships directly, strong signals present in the data may not be discernible. We have addressed this issue by looking at support for relationships between nested subsamples of the most stable taxa, as assessed by leaf stability indices (Thorley & Wilkinson 1999; Smith & Dunn 2008). Three different leaf stability cutoffs were used (see electronic supplementary material for details on cutoff selection): 0 per cent (figure 2, i.e. no threshold), 87 per cent (figure 3) and 90 per cent (figure 4).

Figure 4.

Cladogram showing bootstrap support for relationships between taxa from figure 2 with a leaf stability of 90 per cent or higher. This criterion was met by 84 taxa, though only bilaterian taxa are shown (other relationships were not impacted by the removed taxa). The 844-gene (black values above nodes) and 330-gene (red values below nodes) subsamples are also shown at each node. Asterisk indicates 100 per cent bootstrap support. The taxa included in figure 3, but not here, are Xenoturbella bocki, Spinochordodes tellinii and Priapulus caudatus.

There were minimal differences in support values between analyses where taxa were removed prior to phylogenetic analysis versus after phylogenetic analysis (electronic supplementary material, fig. S4), indicating that unstable taxa had very little impact on the inference of the relationships between stable taxa. This indicates that taxa that are unstable do not negatively impact the ability of large-scale phylogenetic analyses to infer relationships between other taxa, though they do increase the computational burden of the studies.

4. Discussion

(a) Acoelomorpha as sister group to other Bilateria

The hypothesis that acoels (and subsequently nemertodermatids) were outside of Nephrozoa (all other bilaterian animals, i.e. protostomes and deuterostomes) has been one of the biggest challenges generated from molecular sequence data (Ruiz-Trillo et al. 1999, 2002; Jondelius et al. 2002) to the traditional view of animal phylogeny. Acoels have been difficult to place using molecular data in part due to rapid sequence evolution of the species examined, and two recent phylogenomic efforts have failed to place them with confidence (Philippe et al. 2007; Dunn et al. 2008), though Egger et al. (2009) show a similar result to ours. Notably, no EST or genomic data have been previously available for Nemertodermatida, the other major group of acoelomorphs, leaving their position unresolved. Here we find up to 100 per cent bootstrap support for the sister-group relationship of Acoela and Nemertodermatida (figure 4), together forming Acoelomorpha, and our analyses place this group as sister to Nephrozoa. This provides strong evidence that the deepest split within Bilateria is between Acoelomorpha and Nephrozoa. This result is evident only in analyses of the new large matrices and is not recovered when taxon sampling alone is improved (electronic supplementary material, fig. S1). The signal for this placement is therefore dependent on widespread gene sampling, although a similar result is obtained by Egger et al. (2009) using only 43 genes.

The morphological analysis by Ehlers (1985) listed several apomorphies for Acoelomorpha. The strongest morphological argument for this relationship is the complex epidermal ciliary root system with an intercalated network of one anterior and two lateral rootlets that is present in both acoels and nemertodermatids (Ehlers 1985; note that Ehlers regarded Acoelomorpha as a clade of Platyhelminthes). As seen here, Egger et al. (2009) found the acoel Isodiametra pulchra to be the sister to Nephrozoa. However, they questioned the result based on morphological grounds and noted similarities among acoels and rhabditophoran platyhelminths in epidermal cell replacement via mesodermally placed stem cells, and expression of a piwi-like gene in somatic and gonadal stem cells, concluding that the conflict between the phylogenomic and morphological data meant placement of acoels could not presently be resolved. This argument does not take into account other morphological data (e.g. sac-like body, non-ganglionated nervous system, absence of excretory organs, etc.), which have been used by Haszprunar (1996) to argue for a basal position of acoelomorphs in Bilateria. Furthermore, arguments regarding gene content (only three Hox genes, limited number of bilaterian microRNAs, etc.), is consistent with placement for Acoelomorpha as sister to the rest of Bilateria. The stem cell and expression data presented by Egger et al. (2009) can reasonably be interpreted as convergence or symplesiomorphy across Bilateria.

Except for one study based on myosin heavy chain type II (Ruiz-Trillo et al. 2002), molecular analyses have consistently shown a paraphyly of Acoelomorpha, with Nemertodermatida as sister to Nephrozoa and Acoela as sister to this assemblage (Jondelius et al. 2002; Ruiz-Trillo et al. 2002; Wallberg et al. 2007; Paps et al. 2009). This resulted in the previous dismissal of Acoelomorpha. Instead, our results indicate that Acoelomorpha is a clade and forms the most relevant outgroup for comparisons between protostomes and deuterostomes, providing critical insight into the origin, evolution and development of metazoan organ systems (Hejnol & Martindale 2008b; Bourlat & Hejnol 2009). Acoelomorphs possess an orthogonal nervous system (consisting of multiple longitudinal dorsal and ventral cords) and an anterior ring-shaped centralization (absent in some species; Raikova et al. 2001). The placement of Acoelomorpha as sister to Nephrozoa is therefore consistent with older hypotheses that this orthogonal nerve organization is ancestral for Bilateria (Reisinger 1972).

In both nemertodermatids and acoels, there is a single opening to the digestive system, as in cnidarians and ctenophores. A recent study shows that this opening is homologous to the bilaterian mouth and suggests that the anus might have evolved multiple times independently in Bilateria by a connection between the gonoduct and the endoderm of the gut (Hejnol & Martindale 2008a). These data strongly reject old hypotheses about the transition of a cnidarian polyp-like ancestor to a coelomate ancestor of protostomes and deuterostomes (the ‘Enterocoely hypothesis’; Remane 1950).

(b) Diminishing support for the placement of Xenoturbella in Deuterostomia

After an odyssey through the animal tree of life, the enigmatic Xenoturbella bocki seemed to have settled down as part of Deuterostomia; either as a sister group to Ambulacraria (Echinodermata + Hemichordata; Bourlat et al. 2003, 2006; Dunn et al. 2008) or as a sister group to all deuterostomes (Perseke et al. 2007). None of the analyses presented here find strong support for the placement of Xenoturbella with Deuterostomia. Instead, analyses of the new gene matrix (figures 2 and 3) place Xenoturbella with Acoelomorpha (70–71% bootstrap support). This is consistent with falling support for the placement of Xenoturbella within Deuterostomia as data have been added in other studies (Philippe et al. 2007; Dunn et al. 2008), though these previous studies failed to place it with other specific taxa.

The placement of Xenoturbella with Acoelomorpha is not surprising from a morphological point of view and morphological arguments were used by Haszprunar (1996) to include Xenoturbella in Acoelomorpha. In the original description of Xenoturbella (Westblad 1949) it was already regarded as close relative to acoels. The gross anatomy of Xenoturbella—a completely ciliated worm with only a ventral mouth opening to its digestive system and a basiepidermal nervous system—is similar to that of acoelomorphs. Several ultrastructural features, such as the epidermal ciliary rootlets including the unique ciliary tips (Franzén & Afzelius 1987; Lundin 1998), and specific degenerating epidermal cells that get resorbed into the gastrodermal tissue (Lundin 2001), are also found in Acoelomorpha (Lundin & Hendelberg 1996). The simplicity of its nervous system, especially the lack of a stomatogastric system and its basiepidermal localization, is also consistent with a close relationship to Acoelomorpha (Raikova et al. 2000). In contrast, strong morphological support for the placement of Xenoturbella as a deuterostome has not been forthcoming. A detailed ultrastructural study of its epidermis describes the previously noted morphological similarities to the epidermis of hemichordates as superficial and points out the differences in the organization of the ciliary apparatus and the junctional structures (Pedersen & Pedersen 1988).

(c) Cycliophorans, ectoprocts, entoprocts and their relatives

This is the first inclusion of Cycliophora in a phylogenomic study. The new data and analyses place the cycliophoran Symbion pandora with strong support as sister to entoprocts, consistent with a series of anatomical similarities in ultrastructure and developmental features (Funch & Kristensen 1995). The cycliophoran/entoproct grouping is a result recently recovered with molecular sequence data (Passamaneck & Halanych 2006; Paps et al. 2009).

In most of our analyses, the clade composed of Entoprocta and Cycliophora is placed as sister to Ectoprocta (=Bryozoa to some authors), although with low bootstrap support (figures 24). This relationship was suggested by Funch & Kristensen (1995) and a recent phylogenomic analysis found evidence for a clade of entoprocts and ectoprocts, which they referred to as Bryozoa (Hausdorf et al. 2007), but cycliophorans were not sampled. For many years, Ectoprocta and Entoprocta were treated as not being closely related, though Nielsen (2001, and references therein) has long argued for uniting the two groups as Bryozoa. Cavalier-Smith (1998) resurrected the name Polyzoa (originally coined for what is now accepted as Bryozoa) as a taxon to include bryozoans, entoprocts and cycliophorans. Our molecular analyses find evidence for this group, to which we also apply the name Polyzoa. The 844-gene analysis provides more than 80 per cent bootstrap support (figures 3 and 4) for Polyzoa being sister to Platyzoa within Spiralia, and this topology is widely recovered across analyses, though with varying support. Certain features of one polyzoan group, Entoprocta, support the placement of the clade within Spiralia. Two entoprocts that have been studied show spiral cleavage (Marcus 1939; Malakhov 1990), though further detailed embryological analyses are needed.

(d) Ctenophores and the base of the animal tree

Dunn et al. (2008) found strong support for the placement of ctenophores, rather than sponges, as the sister group to all other animals, although it was cautioned that this result should be treated provisionally until taxon sampling was improved. The present paper considers further ctenophore and sponge EST data, as well as Trichoplax genome data (Srivastava et al. 2008), and still gets the same result in analyses of the 1487-, 844- and 330-gene matrices (figure 2). Since the completion of the analyses presented here, an EST study with sampling from all major groups of sponges has been published (Philippe et al. 2009). This study placed Porifera as sister to other metazoans, but bootstrap support was low (62% for other animals, Eumetazoa, to the exclusion of sponges). A recent analysis of a manually curated set of mitochondrial and nuclear genes, together with a small morphological matrix, concluded that ‘Diploblastica’ (including Porifera), not Ctenophora or Porifera, is sister to all other animals (Schierwater et al. 2009). This topology, however, was statistically indistinguishable from a tree that placed Ctenophora as sister to all other animals (see table 1 in Schierwater et al. 2009). Analyses of the deepest splits in the animal tree of life clearly require further taxon sampling, with both new EST and genome projects for Porifera and Ctenophora in particular, before they can be rigorously evaluated.

(e) Phylogenetic inference

This study demonstrates the feasibility of a scalable, fully automated phylogenomic matrix construction method that requires little a priori knowledge for gene selection and is therefore portable to any group of organisms and any scale of phylogenetic problem. Such tools are critical if phylogenomic analyses are to leverage the new high-throughput sequencing technologies. Priorities for future development include improvement of the representation in the final analyses of taxa with relatively few available sequences.

Acknowledgements

This work is supported by the National Science Foundation under the AToL programme to G.G. (EF05-31757), M.Q.M. (EF05-31558) and W.C.W. (EF05-31677), by NASA to M.Q.M. and through IBM Blue Gene/L time provided by the San Diego Supercomputer Center. Additional support to individual project members was also provided from multiple sources. S. roscoffensis ESTs have been sequenced by Genoscope, France. Thanks to S. Smith for implementing monophyly masking in a developmental version of Phyutility and John Bishop for contributing with Pedicellina cultures.

Footnotes

    • Received May 26, 2009.
    • Accepted August 21, 2009.

References

View Abstract