Myxozoans are enigmatic endoparasitic organisms sharing morphological features with bilateria, protists and cnidarians. This, coupled with their highly divergent gene sequences, has greatly obscured their phylogenetic affinities. Here we report the sequencing and characterization of a minicollagen homologue (designated Tb-Ncol-1) in the myxozoan Tetracapsuloides bryosalmonae. Minicollagens are phylum-specific genes encoding cnidarian nematocyst proteins. Sequence analysis revealed a cysteine-rich domain (CRD) architecture and genomic organization similar to group 1 minicollagens. Homology modelling predicted similar three-dimensional structures to Hydra CRDs despite deviations from the canonical pattern of group 1 minicollagens. The discovery of this minicollagen gene strongly supports myxozoans as cnidarians that have radiated as endoparasites of freshwater, marine and terrestrial hosts. It also reveals novel protein sequence variation of relevance to understanding the evolution of nematocyst complexity, and indicates a molecular/morphological link between myxozoan polar capsules and cnidarian nematocysts. Our study is the first to illustrate the power of using genes related to a taxon-specific novelty for phylogenetic inference within the Metazoa, and it exemplifies how the evolutionary relationships of other metazoans characterized by extreme sequence divergence could be similarly resolved.
The availability of DNA sequence information has revolutionized interpretations of the phylogenetic relationships among metazoan taxa. Initially achieved through molecular phylogenies based on 18s rDNA [1,2], subsequent phylogenomic approaches analysing multiple genes  are generating further inferences about metazoan evolutionary relationships. It is widely accepted that broad taxon sampling and a multi-gene approach are critical for improving the phylogenetic resolution of the metazoan tree of life [4,5]. Nevertheless, when such sampling is adopted, certain taxonomic groups can remain inherently problematic owing to extreme divergence of their DNA sequences. Furthermore, phylogenomic studies can themselves result in conflicting interpretations (e.g. regarding the placement of ctenophores [4,5]). An alternative approach to resolving the position of at least some problematic taxa is to focus on taxonomically restricted genes that can act as phylogenetic markers . Analysis of genes linked to a complex character should provide a powerful approach to phylogenetic inference since these genes are unlikely to have evolved convergently. Here, we demonstrate the use of taxonomically restricted genes to inform on the much-debated phylogenetic affinities of the Myxozoa and exemplify how this approach can help to resolve impasses created by highly divergent sequence evolution among the Metazoa.
Myxozoans are endoparasites that use aquatic invertebrates (worms and freshwater bryozoans) as definitive hosts and vertebrates (typically fish) as secondary hosts, causing a number of economically important diseases in the latter [7,8]. Myxozoans were long classified as protists because of their extreme morphological degeneracy, but 18S rDNA sequence data and demonstration of multicellularity eventually confirmed a metazoan affinity [9,10]. Subsequent phylogenetic analyses based on 18S rDNA grouped myxozoans variously as a sister taxon to the Bilateria or within the Cnidaria, reflecting the extreme sequence divergence of the Myxozoa and whether the aberrant Polypodium hydriforme (a cnidarian intracellular parasite of oocytes of acipenseriform fish) was included in analyses . A recent phylogenomic study based on analyses of 50 protein-coding genes  provided evidence that myxozoans group within the cnidarians and demonstrated the contaminant nature of Hox genes suggestive of a bilaterian affinity . However, scepticism regarding myxozoan affinities remains owing to limitations of the phylogenomic study, including problems of missing data, bootstrap support of only 70 per cent, an inability to reject alternative placements under certain models, inclusion of only a small number of cnidarians and the absence of P. hydriforme genes in analyses [11,14,15]. In addition, longitudinal muscles in the worm-like myxozoan Buddenbrockia plumatellae occur as independent blocks of muscles typical of mesoderm , a body layer putatively lacking in cnidarians . Finally, recent phylogenetic analyses combining 18S rDNA and 28S rDNA, and much broader taxon-sampling, have grouped Myxozoa with the Bilateria , while further studies have demonstrated relatively stable positions within both the Bilateria and the Cnidaria, depending on model selection, taxon and data-sampling . The definitive placement of the Myxozoa is therefore regarded as not yet resolved .
All myxozoans possess intracellular organelles known as polar capsules that contain an eversible tubule used to attach to hosts (figure 1a). Nematocysts are the main diagnostic feature of the phylum Cnidaria and also consist of an intracellular capsule with an inverted tubule. Nematocysts are used for prey capture and defence. These similarities in structure and function of polar capsules and nematocysts led to early suggestions that myxozoans are cnidarians [18,19]. Recent research has provided evidence that cnidarian-specific minicollagen and NOWA (nematocyst outer wall antigen) genes encode for key functional constituents of nematocyst walls. Specifically, an interlinking minicollagen–NOWA scaffold enables nematocyst walls to withstand the extremely high osmotic pressure required for nematocysts to act as explosive organelles [20,21].
A set of distinct domains in minicollagens are key to nematocyst wall assembly and functionality (see electronic supplementary material, figure S1). A central collagen triple helix domain (characterized by 12–16 Gly-X-Y repeats) provides flexibility to walls . Adjacent are polyproline stretches of variable length followed by terminal conserved cysteine-rich domains (CRDs). The latter stabilize the capsule wall by covalently crosslinking with similar CRDs in the NOWA protein and other minicollagens, forming a highly resistant network via disulphide bonds. The unique polarity of minicollagen CRDs results in differing structures despite an identical cysteine pattern [23,24]. Such variation in the CRD domain is postulated to underlie the evolution of nematocyst complexity within the Cnidaria via the acquisition of minicollagens as novel modular units [25,26].
Here we report the sequencing and characterization of a group-1-like minicollagen gene (Tb-Ncol-1) from a cDNA library of the myxozoan Tetracapsuloides bryosalmonae, the causative agent of salmonid proliferative kidney disease. The discovery of Tb-Ncol-1 provides strong support for the hypothesis that myxozoans are indeed cnidarians, suggests a direct molecular/morphological link between polar capsules and nematocysts, and demonstrates novel CRD variation relevant to the molecular evolution of minicollagen protein domain structure and folds. In addition, the discovery of Tb-Ncol-1 illustrates the power of employing taxonomically restricted genes in phylogenetic placement and represents the first example of using a gene associated with a phylum-specific morphological novelty to infer placement within the Metazoa.
2. Material and methods
(a) Gene discovery and characterization
A normalized full-length cDNA library was constructed from total RNA purified from T. bryosalmonae spore sacs (see electronic supplementary material, text file). Two hundred and eighty-eight sequenced cDNA clones were assembled and analysed with the AlignIR program (LI-COR). Sequence similarity searches were performed by BLAST [27,28] and FASTA . Nucleotide and amino acid sequences representing known minicollagens were obtained from the NCBI protein and EST databases or from the TGI EST database (http://compbio.dfci.harvard.edu/tgi/). Sequences representing Nematostella vectensis minicollagen homologues were obtained from the Nematostella genome assembly (; http://genome.jgi-psf.org/Nemve1/Nemve1.home.html). Multiple sequence alignments were initially generated using CLUSTALW v. 1.82 . Protein size predictions and amino acid composition were determined using ProtParam (http://us.expasy.org/tools/protparam.html) and potential N-glycosylation sites identified using the NetNGlyc program (http://www.cbs.dtu.dk/services/NetNGlyc/). Signal peptide prediction and protein domain architecture were assessed using the SignalP v. 3.0 program  and the SMART (Simple Modular Architecture Research Tool) program [33,34], respectively.
(b) Obtaining full-length cDNA and verification of gene origin
Primers to Tb-Ncol-1 were designed in the 5′ and 3′ untranslated regions (UTRs) to confirm the sequence corresponding to the full-length open reading frame (ORF) (F1 and R1; electronic supplementary material, figure S2), and to elucidate the genomic organization of Tb-Ncol-1 using T. bryosalmonae-infected bryozoan cDNA and genomic DNA as PCR templates. The cDNA sequence (expected product size of 564 bp) was aligned to the genomic sequence (600 bp) and the intron–exon boundaries identified using the SIM4 program . Hydra minicollagen genomic sequences were obtained from the NCBI genomes database (http://www.ncbi.nlm.nih.gov/projects/mapview/map_search.cgi?taxid=6085). Nematostella minicollagen sequences were obtained from the Nematostella genome assembly.
To verify that the newly discovered minicollagen homologue was T. bryosalmonae in origin, PCR analysis was conducted with primers F2 and R2, using cDNA and genomic DNA from uninfected and T. bryosalmonae-infected bryozoans and rainbow trout kidney tissue.
(c) Phylogenetic analysis
Based on the domain structure, multiple sequence alignments were generated using the MEGA v. 4.1 program  with gaps introduced to increase identity. Phylogenetic analyses were performed using neighbour-joining (NJ) and maximum-likelihood (ML) under JTT + I + Γ model. Bayesian inference (BI) was calculated using aamodelpr = mixed allowing the selection of the best substitution model as a parameter of the analysis. Neighbour-joining analyses were conducted within the Mega program and were bootstrapped 10 000 times. Maximum-likelihood analyses were performed in PHYML  and bootstrapped 500 times. Bayesian inference was performed in MrBayes v. 3.0b4 , with posterior probabilities based on 200 000 generations on two independent runs of four MCMC chains, with every 500th tree saved and the last 75 per cent of trees used to create the consensus tree. Amino acid sequence identities and similarities were calculated using the MatGat v. 2.02 program .
(d) Homology modelling
Three-dimensional models of the N- and C-terminal CRD of Tb-Ncol-1 were constructed using the homology modelling web server, ESyPred3D v. 1.0 , using the experimentally determined CRDs of the Hydra molecule Hm-Ncol-1 obtained from the NCBI Molecular Model Database  (PDB accession nos 1SOP  and 1ZPX ) as templates. Output PDB files were uploaded using the NCBI protein structure similarity search service, VAST (http://www.ncbi.nlm.nih.gov/Structure/VAST/vastsearch), and viewed using Cn3D v. 4.1 (http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml).
3. Results and discussion
(a) Tb-Ncol-1 is homologous to cnidarian group 1 minicollagens
A BLAST search demonstrated that the sequence of a cDNA clone from our full-length T. bryosalmonae cDNA library (electronic supplementary material, figure S2) was homologous to cnidarian minicollagen genes. The full-length cDNA transcript consisted of 620 bp (accession no. FN662483) that translated into a 170 amino acid ORF. The genomic sequence, determined from T. bryosalmonae-infected host genomic DNA, was 656 bp in length (accession no. FN662484). PCR analysis detected Tb-Ncol-1 in infected but not in uninfected host tissues (figure 1b), and the product was confirmed to be Tb-Ncol-1 by sequence analysis. The Tb-Ncol-1 mature peptide (142 amino acids) presents key features of minicollagens (figure 2), including a central collagen-like domain of repeated Gly-X-Y units flanked by two polyproline sequences of six (N-terminal) and seven (C-terminal) residues in length. N-terminal and C-terminal of the polyproline sequences are CRDs containing six cysteine residues arranged as CxxCxxxCxxxCxxxCC and CxxxCxxxxxCxxxCxxxCC, respectively.
Multiple amino acid alignment of Tb-Ncol-1 with medusozoan and anthozoan group 1 minicollagens (figure 2) reveals several features that specifically relate Tb-Ncol-1 to group 1 minicollagens, as opposed to group 2 or 3 minicollagens. These include the 14 + 13 arrangement of the repeating Gly-X-Y units in the central collagen-like domain and the identical cysteine sequence patterns in the two CRDs. However, there are two unique differences in the Tb-Ncol-1 CRDs. There are only two amino acids between the first and second cysteine (C1 and C2) in the N-terminal CRD, and there are five amino acids between C3 and C4 in the C-terminal CRD (figure 2). The T. bryosalmonae minicollagen sequence also shows the highest overall homology to group 1 minicollagens, with amino acid identities and similarities ranging from 33.9 to 55.6 per cent and 44.1 to 64.4 per cent, respectively (figure 1c). Homology between Tb-Ncol-1 and group 2 and 3 minicollagens was lower (29.2–41.6% identity and 34.3–55.3% similarity; electronic supplementary material, table S1). Minicollagens from groups 2 and 3 are more complex than group 1 minicollagens owing to duplication and further diversification of the two CRDs . Unlike all other minicollagens, a potential N-glycosylation site (NXS/T) was detected in the second Gly-X-Y domain of Tb-Ncol-1.
(b) Tb-Ncol-1 is sister to medusozoan group 1 minicollagens
Phylogenetic analyses identified three clades of group 1 minicollagens with high percentage bootstrap confidence and high posterior probabilities (figure 1c). All medusozoan group 1 minicollagens possessing a single Gly-X-Y domain formed clade A. Tb-Ncol-1 possesses a double Gly-X-Y domain and grouped with known double Gly-X-Y-containing group 1 hydrozoan minicollagens (clade B). Anthozoan group 1 minicollagens formed clade C regardless of whether they had a single or double Gly-X-Y domain. Our phylogenetic analysis is of course limited by the paucity of available minicollagen sequences and is currently heavily biased towards hydrozoan sequences. Greater resolution of the relationship of Tb-Ncol-1 to other group 1 minicollagens requires increased representation of these genes from across the cnidaria. Also, increased sampling within myxozoans may uncover further minicollagen variants and representatives of other minicollagen groups.
(c) Genomic organization
Comparison of the T. bryosalmonae minicollagen genomic sequence with the others currently available (three Nematostella and five Hydra minicollagen group 1 genomic sequences) revealed a similar pattern of genomic organization (figure 1d). All nine sequences possess a single intron ranging in size from 36 bp in Tb-Ncol-1 to 2959 bp in Nv-Ncol-5, with the intron–exon boundaries in the Tb-Ncol-1 sequence conforming to the known GT/AG donor/acceptor site rule . Exon I carries the signal peptide and exon II the mature minicollagen peptide. The pattern with which the intron interrupts the propeptide in Tb-Ncol-1 is most similar to that in the Hydra sequences, Hm-Ncol-5/6. Although our study describes the presence of a single intron in the ORF of Tb-Ncol-1, we cannot dismiss the possibility that additional introns may be present in the 5′ UTR of Tb-Ncol-1 as seen in the Hydra minicollagen, Hm-Ncol-6 .
(d) Predicted CRD three-dimensional structure
Minicollagens are unique among CRD-containing proteins in that dramatically different three-dimensional structures are produced from an identical cysteine sequence pattern [23,24,42]. It has been demonstrated experimentally via recombinant protein expression that single amino acid changes in CRD domains with the canonical cysteine sequence pattern can produce two three-dimensional structures . These results suggest that protein evolution may proceed via intermediate ‘bridge’ states that contain novel and ancestral structures in dynamic equilibrium. Minicollagen CRDs thus provide evidence that minor sequence variation can lead to global structural switches in proteins that retain a conserved structural framework, allowing adaptive walks and structural innovation through functional intermediates . As such, minicollagens contribute to an emerging view of protein dynamism and evolvability . Minor variation in minicollagen CRD sequences may therefore contribute to different nematocyst morphologies achieved by distinct combinations of minicollagens and NOWA .
The CRDs characterized for Tb-Ncol-1 are so far unique among group 1 minicollagens, differing from both each other and the respective CRDs in other group 1 minicollagens in the number of amino acid residues in certain regions of the canonical cysteine sequence pattern . Nevertheless, homology modelling predicted only minor differences between the three-dimensional CRD structures of Tb-Ncol-1 and Hm-Ncol-1 (figure 3). The amino acid deletion between C1 and C2 in the N-terminal CRD of Tb-Ncol-1 is associated with a tighter loop compared with the Hm-Ncol-1 structure. Similarly, the extra two amino acids between C2 and C3 in the C-terminal CRD of Tb-Ncol-1 are associated with a tight hairpin loop, while an open loop occurs in Hm-Ncol-1. The positions of the cysteines and side chains in the Tb-Ncol-1 CRDs are very similar to those in Hm-Ncol-1, so the same disulphide arrangement is likely to exist in the mature Tb-Ncol-1 molecule. Thus, it would be expected that intramolecular disulphide bonds would occur between C1 and C4, C2 and C6, and C3 and C5 in the N-terminal CRD, and between C1 and C5, C2 and C4, and C3 and C6 in the C-terminal CRD, as demonstrated experimentally for the Hm-Ncol-1 molecule [24,42]. Notably, the central proline between C3 and C4 in both CRDs that in Hydra directs formation of the correct disulphide linkages is conserved. Overall, these insights suggest that although certain single amino acid changes can result in major variation in three-dimensional structure , others will not.
The lack of evidence for substantial variation in three-dimensional structure in Tb-Ncol-1 relative to the Hydra minicollagen despite sequence variation is in keeping with protein structures generally being far more conserved than their sequences . Nevertheless, it is of interest to speculate on the relative diversity in three-dimensional CRD structures that might be expected. Within the Myxozoa, three-dimensional CRD structural variation may be limited relative to that in the free-living cnidarians, reflecting differences in life history. Myxozoan polar capsules function solely to attach small infectious spores to hosts and their tubules are unadorned. On the other hand, nematocysts in free-living cnidarians have evolved numerous functions, including attachment to live and relatively large prey, the delivery of toxins and defence, and their tubules can possess barbs and spines that aid in these processes. At present, our insights are limited by paucity of information on minicollagen sequences, but future studies may illustrate the molecular basis for protein structure–function relationships by focusing strategically on minicollagens across the Cnidaria.
Sequencing and characterization of Tb-Ncol-1 addresses the much-debated phylogenetic position of the myxozoans by providing strong evidence for placement within the Cnidaria, and a sister-taxon relationship to the Medusozoa. Alternative hypotheses are that: (i) nematocysts did not originate in cnidarians but in protozoans [26,46]; (ii) nematocysts have been lost in bilaterians apart from unique retention in the Myxozoa. The former hypothesis could be explained by lateral organelle or gene transfer from protists that possess extrusible organelles similar to nematocysts [47–49] or the presence of nematocysts in the first metazoans . However, the absence of nematocysts in sponges, ctenophores, placozoans, choanoflagellates and in bilaterians, as well as the absence of minicollagens in the genomes of representatives of relevant taxa (e.g. the placozoan Trichoplax adhaerens, the sponge Amphimedon queenslandica, the choanoflagellate Monosiga brevicollis and all bilaterian genomes sequenced to date), would have to be explained by the multiple independent loss of these characters. This renders scenarios of nematocysts in lineages other than cnidarians unlikely (apart from a protistan lineage if nematocysts originated by lateral organelle or gene transfer). Furthermore, the selective pressure for possessing nematocysts is illustrated by their retention throughout the Cnidaria and by nudibranchs that sequester nematocysts from hydroid prey. The loss of such effective organelles would be surprising. We also note that phylogenomic analyses of 50 protein-coding genes  provide independent evidence against the second hypothesis.
Several lines of evidence suggest that Tb-Ncol-1 is related to group 1 minicollagens, including conservation of the central Gly-X-Y domain, close similarity of the two single CRD domains, relatively high amino acid identity and similarity values, and a similar genomic organization. In addition, three-dimensional modelling of the CRD domain predicts a structure similar to that of a Hydra CRD, with minor shape variations reflecting deviations from sequence patterns characterizing all known minicollagen CRDs. More generally, our study provides strong evidence that the radiation of cnidarians as endoparasites of freshwater, marine and terrestrial hosts represents a major event in the evolution of basal metazoans. Our study also reveals novel protein sequence variation that is likely to be relevant to understanding the evolution of nematocyst complexity and reflective of a molecular/morphological link between myxozoan polar capsules and cnidarian nematocysts. Finally, our conclusions represent the first example of inferring phylogenetic placement on the basis of genes that contribute to a phylum-specific morphological novelty within the Metazoa.
This work was supported by BBSRC grant BB/F003242/1. We thank Dr Jun Zou (SFIRC) for assistance with the homology modelling and Alan Curry (Manchester Royal Infirmary) for image of polar capsule. The manuscript has benefitted by comments from Drs Jun Zou, Stuart Piertney (Aberdeen University) and Alex Gruhl (Natural History Museum).
- Received June 17, 2010.
- Accepted August 13, 2010.
- © 2010 The Royal Society