Biosilicification is widespread across the eukaryotes and requires concentration of silicon in intracellular vesicles. Knowledge of the molecular mechanisms underlying this process remains limited, with unrelated silicon-transporting proteins found in the eukaryotic clades previously studied. Here, we report the identification of silicon transporter (SIT)-type genes from the siliceous loricate choanoflagellates Stephanoeca diplocostata and Diaphanoeca grandis. Until now, the SIT gene family has been identified only in diatoms and other siliceous stramenopiles, which are distantly related to choanoflagellates among the eukaryotes. This is the first evidence of similarity between SITs from different eukaryotic supergroups. Phylogenetic analysis indicates that choanoflagellate and stramenopile SITs form distinct monophyletic groups. The absence of putative SIT genes in any other eukaryotic groups, including non-siliceous choanoflagellates, leads us to propose that SIT genes underwent a lateral gene transfer event between stramenopiles and loricate choanoflagellates. We suggest that the incorporation of a foreign SIT gene into the stramenopile or choanoflagellate genome resulted in a major metabolic change: the acquisition of biomineralized silica structures. This hypothesis implies that biosilicification has evolved multiple times independently in the eukaryotes, and paves the way for a better understanding of the biochemical basis of silicon transport through identification of conserved sequence motifs.
Biosilicification is the genetically controlled incorporation of amorphous silicon dioxide into the physical macrostructure of an organism. In eukaryotes, biosilica formation occurs within a silicon deposition vesicle (SDV) to isolate the polymerizing silica from the general cell metabolism .
Silicon is present in the environment primarily as silicic acid (Si(OH)4). Silicic acid autopolymerizes to silica at concentrations above 2 mM, so silica formation within the SDV is more efficient close to this concentration . However, the low concentrations of silicic acid in modern oceans and freshwaters (10–180 µM, ) are insufficient to support silicon uptake by passive transport. Silicifying organisms, therefore, require an active transport mechanism to take up silicon from the external environment and to concentrate silicon within the SDV. Because silicic acid is a relatively inert species, with no known biochemical role outside that of biomineralization [4,5], silicon transport requires the evolution of transmembrane proteins with highly specialized interactions with silicon.
Silicon-transporting proteins are known from only a few eukaryotic groups. Diatoms possess the silicon transporter (SIT) family, whose activity and expression is tightly coupled to biosilicification [6,7]. SIT-like genes have been found in other siliceous stramenopiles (Synura and Ochromonas ), but in no other groups. Land plants do not possess SIT-like genes, instead possessing NOD26-like intrinsic protein (NIP)-type transporters for silicic acid (Lsi1, Lsi6) [9,10] and active silicon efflux pumps (Lsi2) . No SITs are known from siliceous sponges, but a co-transporter has been postulated to have a role in silicon transport .
There is no obvious similarity or homology between the genes responsible for silicon transport from the different eukaryotic supergroups in the current literature. Within each transporter type, only a few silicon transport-related protein motifs have been identified, the GXQ amino acid motifs in SITs  and arginine/aromatic selectivity filter of silicon-related NIPs . This lack of homology has hindered research into silicon biochemistry and prevented the identification of other silicon-interacting protein features.
Choanoflagellates are a group of heterotrophic aquatic protists that are the closest unicellular relatives of the animals [15,16]. Within choanoflagellates, the Acanthoecidae are a monophyletic group  characterized by the possession of an extracellular lorica constructed from a series of costal strips . Each costal strip is a hollow tube of silica, formed individually within an SDV before being exocytosed from the cell . Costal strip size and shape varies within individual loricae, and the overall lorica morphology varies between species .
Silicon-deprivation studies on loricate species revealed that costal strip formation and lorica assembly are affected incrementally in relation to silicon availability. Under silicon-depleted conditions, progressively thinner costal strips are produced, and under zero-silicon conditions no costal strips or SDVs are formed [21,22]. Upon silicon replenishment, costal strip production recommences, with silicon levels in the culture medium falling as new loricae are formed . Electron microscopy reveals no evidence for vesicle-based pinocytotic uptake of silicic acid during lorica formation . Collectively, this points towards choanoflagellates possessing a transporter-based system for the active uptake of silicon into the SDV for costal strip formation.
Here, we report the identification of genes from two loricate choanoflagellates, Stephanoeca diplocostata and Diaphanoeca grandis (figure 1) with significant sequence similarity to the SIT genes of diatoms. This is the first identification of homologous silicon-related genes between siliceous species from different eukaryotic supergroups. The limited taxonomic distribution of SIT homologues suggests that they originated by lateral gene transfer (LGT), transferring an important biochemical innovation between disparate eukaryotic groups.
2. Material and methods
(a) Identification and cloning of silicon transporter sequences
RNA extracted from cultures of S. diplocostata was used to produce a cDNA library, which was sequenced using 454 pyrosequencing. The resulting transcriptome reads were assembled into contigs for bioinformatic analysis (see the electronic supplementary material for detailed description). Those contigs containing SIT domains were selected for further tBLASTx analysis  against the EMBL/Genbank non-redundant nucleotide databases (see the electronic supplementary material). The relevant open reading frames were used as queries in PSI-BLAST  searches against the EMBL/Genbank non-redundant protein database. ClustalX was used to generate a protein alignment of all the translated contigs containing the SIT domain.
The longest S. diplocostata SIT-like contig was used as query sequence for a tBLASTx search (significance threshold value = 1×10−10) against a partial genome sequence dataset from D. grandis (see the electronic supplementary material). A protein alignment of those contigs providing a significant hit was generated using ClustalX, and the longest contig analysed using InterProScan.
Stephanoeca diplocostata dTcDNA was made using Superscript III First-Strand Synthesis reverse transcriptase (Invitrogen) and 2 µl (approx. 1 µg) of cDNA used as RT-PCR template. A total of 200 ng of genomic DNA, extracted from D. grandis cultures using a cetyltrimethylammonium bromide (CTAB) buffer based method , was used as PCR template together with the DgSITa primers. The RT-PCR and PCR protocols were as follows: hot-start denaturing step of 94°C for 4 min; 35 cycles of 94°C (30 s), 50°C (30 s), 72°C (105 s); final elongation step of 72°C for 5 min (see the electronic supplementary material for primer sequences and combinations). Amplified sequences were cloned into the PGEM-T Easy Vector System (Promega) and Subcloning Efficiency DH5α Competent Cells (Invitrogen). Plasmids were extracted using a Qiaprep Spin Miniprep Kit (Qiagen) and sequenced by SourceBioScience (Cambridge, UK).
(b) Bioinformatic analyses and alignments
The tBLASTx, PSI-BLAST and InterProScan analyses were repeated for the consensus sequences for all cloned choanoflagellate SITs. The DgSITa protein sequence was identified by tBLASTx comparison to the SdSIT genes. The 5′ portion of the amplified sequence contained stop codons and a putative splice donor site, suggesting that it represents intronic sequence. The DgSITa sequence was trimmed to remove this putative intron. The remaining 554 bp exon, encoding a 185 amino acid protein sequence, was used for all further analyses.
WoLF PSORT analysis  was used to predict the subcellular location of the SdSIT proteins. The COILS prediction server (http://www.ch.embnet.org/software/COILS_form.html) was used to search for putative coiled-coil motifs. Topology predictions were carried out for all SdSITs and DgSITa to identify transmembrane domains (TMDs) using TMPred , HMMTop  and TMHMM .
Pairwise identity measurements between the choanoflagellate SITs were generated with ClustalW2 (www.ebi.ac.uk/Tools/msa/clustalw2/). A full alignment of the four SIT sequences was generated to ClustalX v. 2.0.9, using the default settings (gap opening = 10, gap extension = 0.2, delay divergent sequences = 30%) and the Gonnet Series matrix.
A second protein alignment incorporating non-choanoflagellate SITs was generated with ClustalX under the same settings. The 156 significant PSI-BLAST hits to SdSITa were aligned against the choanoflagellate SIT sequences (see the electronic supplementary material). Choanoflagellate and stramenopile SITs were compared to identify conserved motifs, charged residues and hydroxylated residues (following ). Features were noted if they were conserved across all SITs. A subset of this alignment was generated in order to display the results of the larger comparison and to include the predicted TMDs of SdSITs and Phaeodactylum tricornutum SITs .
(c) Phylogenetic analysis
Sequences were chosen from the EMBL/Genbank non-redundant protein database to cover a taxonomic range of SITs (see the electronic supplementary material) and aligned using ClustalX v. 2.0.9 (default settings and Gonnet Series matrix). The alignment was inspected and sequences manually trimmed to minimize missing data. The final alignment contained 20 sequences from 10 different species, a total of 769 positions.
Prottest  found that the LG + Г model provided the best fit (under the Akaike Information Criterion) to the data. Maximum-likelihood (ML) analysis was carried out using PhyML v. 3.0  with the LG + Г model, the proportion of invariant sites and equilibrium frequencies of amino acids estimated from the data fixed according to the LG model. Starting trees were generated by BioNJ, with tree searching using SPR + NNI heuristic methods. Topology and branch lengths were optimized in ML calculations. A second ML analysis was performed using RaxML v. 7.2.8 ; the parameters used were identical to the PhyML analysis except using the WAG substitution matrix. One hundred bootstrapped datasets were analysed using the same respective model and method for each of the PhyML and RaxML analyses. Bootstrap proportions were added to the nodes of the ML tree. Bayesian MCMC analysis was done using MrBayes  running over 2 million generations, with four chains, sampling every 5000 generations, and with a burn-in of 100 000 generations; with the WAG substitution model, four γ-distributed rate categories with the α-value and proportion of invariant sites estimated. Bayesian posterior probabilities were added to the highest likelihood Bayesian analysis tree. Trees generated were viewed with FigTree v. 1.3.1 (A. Rambaut, Institute of Evolutionary Biology, University of Edinburgh 2006–2009).
(a) Identification of silicon transporters in the Stephanoeca diplocostata transcriptome
Transcriptome sequencing of S. diplocostata produced 0.261 Gb of data assembled into 26 325 contigs. InterProScan analysis found six transcriptome contigs containing putative SIT protein domains. The most significant tBLASTx and PSI-BLAST search hits were to known diatom SIT sequences (see the electronic supplementary material). Searches excluding diatoms found significant similarity (e-value > 1 × 10−5) to SIT sequences from the siliceous stramenopiles Synura petersenii and Ochromonas ovalis. No significant similarity was found to any sequences from the non-loricate choanoflagellates Monosiga brevicollis, Salpingoeca rosetta or Monosiga ovata.
These six contigs appear to represent three full-length genes, with a shorter contig (08888) that may represent minor allelic variation, errors in sequencing that led to a distinct assembly or a partial or inactive gene. The three full-length genes were designated S. diplocostata silicon transporter (SdSIT) a, b and c.
The validity of the transcriptome assemblies for all three genes was confirmed by RT-PCR from S. diplocostata cDNA. SdSITa and SdSITb yielded products with 99 per cent nucleotide identity to the assembled contigs. In SdSITc, the gene-specific 5′ region was successfully amplified, while comparing the contigs showed that the 3′ region is identical to that of SdSITb. Repeating tBLASTx, PSI-BLAST and InterProScan analyses for the final sequences of each SdSIT gene produced the same results as the original contigs (see the electronic supplementary material).
(b) Identification of an incomplete silicon transporter in the Diaphanoeca grandis genome dataset
A tBLASTx search querying SdSITa against the partial D. grandis genome dataset found 25 hits at a significance threshold of 1 × 10−5 (data not shown). The longest contig is 931 bp and has potential to encode a protein with similarity to the S. diplocostata SITs over 193 amino acids. The remaining contigs, mostly shorter than 250 bp, encode protein fragments similar to short stretches within this region of the protein, but the fragmentary genome data precludes determination of how many genes they represent. We conclude that D. grandis also has a family of SIT-related genes.
The 5′ end of the 931 bp DgSITa contig has low sequence similarity to other SIT sequences and contains in-frame stop codons. There is a putative splice acceptor site just upstream of the conserved protein coding sequence, suggesting that this 5′ sequence represents an intron. This provides strong evidence against a bacterial origin for the SIT domain containing sequences in the D. grandis partial genome data. The conserved protein sequence runs to the 3′ end of the available contig.
PCR amplification of DgSITa from genomic DNA produced a sequence identical to the original contig, except that a 71 bp long tandem repeat in the putative intron was absent in the PCR-derived sequence. Attempts to extend the available DgSITa sequence using RT-PCR or 5′-RACE were unsuccessful. The results of the BLAST searches and InterProScan analysis using DgSITa are given in the electronic supplementary material.
(c) Bioinformatic analysis of Stephanoeca diplocostata silicon transporter
The SdSIT-deduced proteins show high levels of sequence similarity (89–95% nucleotide identity). WoLF PSORT was used to analyse the sequences for predicted subcellular localization. All three databases gave a majority prediction that each SdSIT protein is targeted to the plasma membrane. Topology prediction programmes identified nine, 10 or 11 TMDs (see the electronic supplementary material). As diatom SITs have been consistently predicted to have 10 TMDs (orange line, figure 2), we applied 10 TMD prediction from TMPred to the SdSITs (black line, figure 2). The TMDs predicted for choanoflagellate and diatom SITs overlap to some degree; however, the location of TMD 4 is considerably different. The COILS software found no evidence for putative coiled-coil motifs in the SdSIT sequences, unlike pennate diatom SITs which contain prominent C-terminal coiled-coil motifs .
(d) Identification of conserved motifs in silicon transporter proteins
Alignment of choanoflagellate, diatom and stramenopile SITs revealed five motifs in common across all SITs (figure 2). The two pairs of GXQ-containing motifs have been noted previously, the first in TMDs 2 and 3, the second located in TMDs 7 and 8, but symmetrically orientated with respect to the membrane polarity . In our alignment, the C-terminal motifs of the pair contained GRQ, apart from one CRQ motif (Shionodiscus ritscheri). The N-terminal motif sequences in each pair were resolved as EGXQ, X being cysteine, glycine or glutamine in the first motif pair and methionine, leucine, isoleucine or threonine in the second. Variants were again only found at the second motif of the pair: EAMQ (in Navicula pelliculosa SIT4) and KGMQ (in Stephanodiscus yellowstonensis). In choanoflagellate SITs, all motifs were EGCQ–GRQ (first motif pair) and EGLQ–GRQ (second motif pair).
We also identified an additional conserved sequence between the EGXQ–GRQ motif pairs: this motif is (G/S)QL. GQL is by far the more common variant and is present in all choanoflagellate SIT sequences. The CMLD motif  was not found in any choanoflagellate SIT.
In addition to these motifs, three hydroxyl-containing residues (T115, T290 and S382 in SdSITa), three negatively charged residues (E133, E225 and D476 in SdSITa) and one positively charged residue (R134 in SdSITa) were identified as being conserved across all SITs. These residue classes have been proposed to function in silicic acid transport  (see §4).
The first GRQ motif (TMD 3), the second EGXQ motif (TMD 7) and the conserved residues are found in similar position with respect to membrane orientation in both choanoflagellates and stramenopiles. The first EGXQ motif is intracellular in diatoms but placed within TMD 2 in choanoflagellates. Conversely, the second GRQ motif is extracellular in choanoflagellates but is predicted to be within TMD 8 in diatoms. In choanoflagellates, the (G/S)QL motif is inside TMD 4, but is intracellular in diatom SIT structures. This relates to the major difference in the predicted location of TMD 4 between diatoms and choanoflagellates (figure 2).
(e) Evolution of silicon transporter genes
An ML tree produced from analysis of choanoflagellate, diatom and non-diatom stramenopile SIT protein sequences is shown in figure 3. This tree was obtained using the LG + Γ model implemented in PhyML; RaxML analysis and Bayesian analysis (see the electronic supplementary material) using the WAG substitution matrix produced identical topologies. Trees are unrooted because of the lack of any known SITs or SIT homologues, from outside the choanoflagellates or stramenopiles, to act as an outgroup.
Choanoflagellate SITs (in red) form a distinct monophyletic clade (100% bootstrap support, posterior probability = 1). Within this, the S. diplocostata paralogues resolve as a clade, though with low bootstrap support. An exceptionally long branch marks the division between the choanoflagellate and stramenopile sequences.
The SITs from non-diatom stramenopiles (shown in brown) branch from within the diatom SITs with significant (94–100%) bootstrap support and posterior probability (=1). These synurophyte and chrysophyte SITs are sister to a subset of the pennate diatom SITs (blue). Centric diatom SITs branch together as a monophyletic clade (100% bootstrap support, posterior probability = 1) with pennate diatom SITs resolving as paraphyletic.
The loricate choanoflagellates S. diplocostata and D. grandis possess a family of genes encoding the SIT protein domain, a domain previously found only in diatoms, synurophytes and chrysophytes. This is the first report of homology between putative SITs in two different eukaryotic supergroups, opisthokonts and stramenopiles . These SIT domains are not present in any of the sequenced non-siliceous choanoflagellate species [15,17,38,39], implying a correlation between their presence and lorica biomineralization.
(a) Evidence against a contaminant origin
That distinct, but closely related, SIT genes are found in two different loricate choanoflagellate species is strong evidence against an origin from stramenopile contamination. This is further supported by phylogenetic analysis, which resolved choanoflagellate SITs as a strongly supported clade separate to the stramenopile SITs (figure 3). Furthermore, no housekeeping genes (rRNA, tubulins, etc.), were found among those transcriptome contigs with highest tBLASTx similarity to stramenopile sequences, and no non-choanoflagellate eukaryotes were observed in the cultures.
(b) Silicon transporters in choanoflagellates
The SIT gene family is best characterized from S. diplocostata, but the evidence indicates that an SIT gene family also exists in D. grandis. SdSITa, b and c show very high levels of sequence identity, up to 95 per cent (nucleotide) and 94 per cent (amino acid). This is similar to identity levels in diatom SITs [13,40]. Such a high degree of conservation implies either strong functional constraints  or that these genes are the product of a relatively recent gene duplication event.
Different SdSIT paralogues may be specialized to different subcellular locations (e.g. plasma membrane or SDV). This is supported by variation at the N-termini, where eukaryotic localization signal sequences often occur . Alternatively, each SdSIT may have different transport activity, as suggested for diatom SIT3 [7,40]. Unlike diatoms , S. diplocostata continually produces biosilica under normal conditions . This would require a steady, constant uptake of silicon, which may be reflected in the similar read numbers observed for SdSITa, b and c.
(c) Silicon transporter evolution
Phylogenetic reconstruction (figure 3) found that choanoflagellate and stramenopile SIT sequences are evolutionarily distinct, with a long and well-supported branch between the two groups. SdSITs appear to be monophyletic, indicating gene duplication within the loricate choanoflagellates.
The synurophyte and chrysophyte SIT sequences group together as a well-supported monophyletic clade, as do the centric diatom SITs. Both these clades branch from within the paraphyletic pennate diatom SITs. This branching order is incongruent with previous hypotheses of stramenopile evolution based on siliceous fossils , but this may be an artefact of taxon sampling that could be resolved by obtaining more SIT sequences from non-diatom stramenopiles. The phylogenetic analyses are in agreement with some aspects of previous analyses [13,31], such as the close relationship of C. fusiformis SIT genes and the highly divergent nature of TpSIT3 and PtSIT3.
The phylogenetic analysis clearly shows that the diatom SIT numerical classification system is incongruent with patterns of orthology between species and therefore choanoflagellate SIT genes were classified alphabetically, to avoid any statement of homology with individual members of the diatom SIT gene families.
(d) Proposed structure and function of choanoflagellate silicon transporter proteins
The similar TMD topology predictions and conserved residues between choanoflagellate and stramenopile proteins (figure 4) can be used to support a common mechanism for SIT transmembrane silicon transport in both these groups, similar to that proposed by Thamatrakoln et al. .
SITs possess an inverted symmetrical structure of 5 + 5 TMDs (figure 4). This is characteristic of the LeuT-fold-type sodium symporters, which all share a common inverted 5 + 5 TMD structural psuedosymmetry [45,46]. The degree of symmetry observable from the amino acid sequence of SITs is notable in comparison with other LeuT-fold transporters, whose inverted pseudosymmetry is only discernible from their three-dimensional structure and is not visible in the primary protein sequence . This strongly supports the theory that SITs evolved by homodimerization of an ancestral transmembrane transporter by gene duplication followed by gene fusion [13,47]. The inverted symmetrical structure would allow bidirectional silicon transport, predicted for both diatom and choanoflagellate SITs. Silicic acid efflux transport is thought to have had a major role in the high-silicon pre-Cenozoic oceans , to prevent over-accumulation of silicic acid and subsequent uncontrolled auto-polymerization of silica in the cytoplasm [13,48].
Owing to the structural similarities, we applied the alternating access transporter mechanism developed for the LeuT-fold-type transporters (reviewed in [46,49]) to SITs. We suggest that the highly conserved TMDs 2, 3, 7 and 8 form the central helices (equivalent to TMDs 1, 2, 6 and 7 of LeuT), with the remaining TMDs forming the surrounding scaffold helices. The EGXQ and/or GRQ motifs are likely candidates to be the silicic acid-binding sites, as the charged residues could provide a localized polar environment characteristic of the unwound helix region found at binding sites . The glutamine side-chains may interact with local negative charges in the hydroxyl groups of the silicic acid molecule . The remaining EGXQ and/or GRQ motifs would lie at the point of substrate entry to or exit, where the charged residues could play a role in orientating the substrates into and out of the aqueous vestibule.
Sodium and silicon are transported in a 1 : 1 ratio in diatoms , and calculated Hill binding coefficients of SITs [48,52] indicate the presence of a second binding site. In LeuT-fold transporters, Na+ binding is required for correct substrate binding and to cause conformational change of the core helices for transport [46,49], and this must also be the case for SITs. The conserved hydroxylated residue (S382 in SdSITa) in the proposed central bundle region (figure 4) may act as a binding site for Na+, similar to the Na2 sodium-binding site LeuT and vSGLT transporters .
Marine diatom SITs require Na+, but in freshwater diatoms both Na+ and K+ are involved in silicon uptake, a feature related to salinity differences between the two environments . Loricate choanoflagellates are primarily marine , with the important exception of Acanthocorbis mongolica . In comparison, non-loricate choanoflagellates are found in a range of salinities . It may be that most loricate choanoflagellate SITs cannot make use of K+ in the same way as freshwater diatom SITs, preventing biosilicification and limiting colonization of freshwaters. The SITs of A. mongolica will be of particular interest with regards Na+ and SIT activity.
At the cytoplasmic opening to the proposed binding vestibule (between TMD 2 and 3, figure 4), the intracellular loop contains conserved charged and hydroxylated amino acids. These residues may interact with the theorized cytoplasmic silicic acid-binding component that forms an organic complex for intracellular silicon transfer to the SDV . The (G/S)QL motif and other conserved features lie outside the proposed central bundle helices. This hints that they may play a structural role, rather than having direct involvement in substrate binding.
Comparison of diatom and choanoflagellate SITs provides evidence against other proposed models. Choanoflagellate SITs lack the conserved extracellular cysteines previously suggested to play a role in diatom SIT activity , and the CMLD motif is absent in all choanoflagellate SITs, as is the possible YXXL-binding site, strongly arguing against a critical role for zinc-binding role in SIT transport .
(e) Lateral gene transfer and the origin of silicon transporters in loricate choanoflagellates
The genomes of choanoflagellates and some stramenopiles contain SIT genes encoding highly similar proteins. The absence of genes with any recognizable homology to SITs in eukaryotes other than loricate choanoflagellates and siliceous stramenopiles makes it extremely unlikely that such SIT genes were present in the last common ancestor of choanoflagellates and stramenopiles. This organism would be equivalent to the last common ancestor of all eukaryotic supergroups , and therefore this hypothesis would involve the loss of all SIT homologues from all other sequenced eukaryotes.
The convergent evolution of SITs in loricate choanoflagellates and siliceous stramenopiles would require not only parallel evolution of multiple SIT-specific features, but also the loss of all related or ancestral genes in all other opisthokonts and non-siliceous stramenopiles. The taxonomic coverage of available sequence data [55,56] makes such a conclusion highly unlikely.
We therefore believe that on the basis of current evidence, the best model for the evolution of SITs is through LGT. The most parsimonious hypothesis involves a single LGT event between a choanoflagellate and a stramenopile, in one direction or the other.
LGT is prevalent both in choanoflagellates  and diatoms . In choanoflagellates, this may be related to their phagotrophic lifestyle . In M. brevicollis, LGT from both eukaryotic and prokaryotic sources has been proposed to be responsible for the gain of stress-related genes . Stramenopile-derived genes are thought to be unusually frequent in M. brevicollis , including genes linked with the evolution of multicellularity . Diatoms are believed to have gained some of their biosilicification machinery (related to long-chain polyamine formation) from LGT  in addition to other diatom-specific metabolic pathways .
Though the genomic arrangement of choanoflagellate SIT genes is unknown, diatom biosilicification genes are often clustered . These clusters possess their own promoter sequences, with a possible silicon-specific promoter suggested from T. pseudonana . This spatial association of biosilicification genes allows for coordinated regulation in diatoms , but it may make SITs more amenable to LGT by having a self-regulating genomic unit that can interact with existing metabolic pathways with minimal disruption .
Major questions concern the direction and timing of the SIT-related LGT event. The earliest confirmed diatom fossils are from the Mesozoic , with putative Precambrian chrysophyte fossils . Choanoflagellates lack a fossil record, though molecular clock estimates place the origin of the choanoflagellate branch to approximately 750 Ma . Choanoflagellate biosilicification is believed to have evolved only once, in the ancestor of loricates , but further evidence is required to confirm this, or to determine whether SIT-related silica biomineralization is the ancestral condition of the choanoflagellates and was lost in non-loricate groups . This ambiguity, together with the incomplete fossil record and phylogenetic results (figure 3), means that it remains inconclusive as to whether the LGT event was stramenopile-to-choanoflagellate, diatom-to-choanoflagellate or choanoflagellate-to-stramenopile.
(f) Evolutionary implications for biosilicification
In addition to SITs, there are several classes of genes involved in biosilicification in diatoms (e.g. silaffins  and silacidins ). None of these, other than SITs, were identified in our sequence data. Additionally, no evidence was found for choanoflagellate versions of the silica-related sponge enzyme silicatein . This suggests that other components of the choanoflagellate biosilica metabolic machinery may have a choanoflagellate-specific origin.
Bulk organic analysis of the costal strips from S. diplocostata yielded a glycoprotein that has been proposed to be involved in costal strip silicification . Multiple glycoprotein-encoding genes were identifiable in the S. diplocostata transcriptome dataset; however, no obvious candidates for a costal strip-specific glycoprotein could be identified (data not shown). These results support an origin of biosilicification in choanoflagellates that is independent both of sponges and diatoms, though the choanoflagellate and diatom cases may be related to each other by LGT.
In this scenario, biosilicification with a known mechanism arose a minimum of twice in the opisthokonts—once in the metazoans (siliceous sponges) and once in the choanoflagellates; and independently at least once in each of the Stramenopiles+Alveolates+Rhizaria supergroup (stramenopiles) and archaeplastids (in land plants). Biosilicification is found in isolated species or small groups in each of the eukaryotic supergroups . However, current molecular evidence of differing mechanisms points towards multiple independent origins of biosilica, rather than biosilicification being present in the last common ancestor of all eukaryotes. In the case of the choanoflagellate and stramenopile SITs, and the land plant NIPs , the evolution of genes encoding silicon-transporting proteins appears to involve LGT events. Owing to the unusual biochemistry of silicon , the acquisition of silicon transport by LGT may be relatively more prevalent than their de novo evolution from existing transporter genes.
Biosilicification provides many selective benefits (e.g. protection, ), so once a foreign SIT gene was incorporated into the genome it would confer a strong selective advantage for that organism. Several common biomolecules, such as collagen , glycoproteins , polyamines  and proteases  are known to have the capacity to direct silica polymerization. It may be the case that many taxa possess the capacity for silicification but lack the means to concentrate sufficient silicon for polymerization to occur. This difficulty would be overcome by the acquisition of a laterally transferred gene for silicon transport, with important implications for the ecology , evolutionary diversity  and biogeochemistry  of the newly biomineralizing lineage.
We thank Barry Leadbeater (University of Birmingham) for providing cultures of S. diplocostata and D. grandis, and Dan Richter and Nicole King (University of California, Berkeley) for helpful discussions. Ken Siggens gave valuable technical advice. Sequencing work was supported by BBSRC Capacity and Capability Challenge Programme project no. CCC-1–10. This work was supported by the BBSRC Comparative Genomics training grant BB/E527604/1 to A.O.M. and a Leathersellers' Company Scholarship to A.O.M awarded by Fitzwilliam College, Cambridge, UK.
- Received October 30, 2012.
- Accepted January 17, 2013.
- © 2013 The Author(s) Published by the Royal Society. All rights reserved.