Aphids evolved novel cells, called bacteriocytes, that differentiate specifically to harbour the obligatory mutualistic endosymbiotic bacteria Buchnera aphidicola. The genome of the host aphid Acyrthosiphon pisum contains many orphan genes that display no similarity with genes found in other sequenced organisms, prompting us to hypothesize that some of these orphan genes are related to lineage-specific traits, such as symbiosis. We conducted deep sequencing of bacteriocytes mRNA followed by whole mount in situ hybridizations of over-represented transcripts encoding aphid-specific orphan proteins. We identified a novel class of genes that encode small proteins with signal peptides, which are often cysteine-rich, that are over-represented in bacteriocytes. These genes are first expressed at a developmental time point coincident with the incorporation of symbionts strictly in the cells that contribute to the bacteriocyte and this bacteriocyte-specific expression is maintained throughout the aphid's life. The expression pattern suggests that recently evolved secretion proteins act within bacteriocytes, perhaps to mediate the symbiosis with beneficial bacterial partners, which is reminiscent of the evolution of novel cysteine-rich secreted proteins of leguminous plants that regulate nitrogen-fixing endosymbionts.
Approximately 200 Ma, aphids evolved novel cells, called bacteriocytes, that house the gamma-proteobacterial endosymbiont, Buchnera aphidicola. Physiological studies and whole-genome sequence analysis of Acyrthosiphon pisum and Buchnera have revealed an intimate metabolic collaboration between partners. Buchnera provides the host with nutrition such as amino acids that aphids cannot synthesize and that are deficient in plant phloem sap, aphids' sole dietary component (reviewed in [1,2]). The aphid–Buchnera association is obligate and mutualistic with neither partner being able to reproduce in the absence of the other [1,2]. The symbionts are passed directly from mother to offspring by transovarial transfer [3,4].
Acyrthosiphon pisum, with 34 604 predicted genes in its 464-Mb draft genome assembly, has the highest gene count of any insect sequenced to date, although a precise gene count awaits better genome assembly and further analyses . This high gene count reflects both extensive gene duplications and the presence of aphid-specific orphan genes. Orphan genes, which display no significant similarity with genes identified in other sequenced organisms, comprise 20 per cent of the total number of genes in the pea aphid genome . The functions of these orphan genes are unknown, but their evolution may be related to the evolution of lineage-specific traits in aphids, such as symbiosis with Buchnera.
To understand the host's role in this symbiotic system, mRNA of the host bacteriocyte of the pea aphid Acyrthosiphon pisum has been assessed by expressed sequence tag (EST)-based transcriptome analyses [6,7] and, more recently, by deep-sequencing transcriptome analysis . Many of the over-represented transcripts are associated with the utilization of essential amino acids, biosynthesis of non-essential amino acids and transport, which support the metabolic collaboration between the host and symbiont. In addition, laterally acquired genes of alpha-proteobacterial origin are also preferentially transcribed in bacteriocytes [6,9,10]. Orphan genes have not been the focus of previous studies.
In this study, we conducted deep sequencing of mRNAs isolated from bacteriocytes followed by whole mount in situ hybridization of selected over-represented genes and focused on aphid-specific orphan genes. Here, we report the identification of a novel class of genes that encode small proteins with signal peptides, which are often cysteine-rich, that are expressed exclusively in bacteriocytes. These genes are expressed first at a developmental time point coincident with the incorporation of symbionts and their bacteriocyte-specific expression is maintained throughout the aphid's life.
2. Material and methods
(a) Aphids and RNA samples
Acyrthosiphon pisum strain LSR1 was used in this study. Parthenogenetic aphid clones were reared on broad bean, Vicia faba, in growth chambers with a long-day photoperiod (16 L : 8 D cycle) at 19°C. L3–L4 nymphs and young adults of parthenogenetic females were used to prepare bacteriocyte and whole body samples. More details on dissections and RNA extraction are provided in the electronic supplementary material, materials and methods.
(b) Deep sequencing
RNA-seq libraries were constructed using Illumina mRNA-seq Sample Prep Kit (Illumina #RS-100-0801). The libraries were run for 35 or 36 cycles by The Genomics Core Facility at Princeton University and the Genomics Resource Center at Rockefeller University. RNA-seq reads have been deposited at the Short Read Archive (NCBI) under accession no. SRA026589. Details are described in the electronic supplementary material, materials and methods.
(c) RNA-seq data analysis
Illumina reads from multiple runs were combined. We built a custom transcriptome reference set consisting of pea aphid official gene set v. 1.0 (OGS1) , which was downloaded from AphidBase (http://www.aphidbase.com/), and cDNA sequences overlooked from OGS1 , which amount to 34 639 gene models in total. The Illumina data were aligned to the custom transcriptome reference using Bowtie . The Bowtie outputs were processed to obtain tag counts and reads per kilobase per million (RPKM) values using a custom script. Differential expression between libraries was assessed by the likelihood ratio test as implemented in the program DEGseq . Secretion signal sequences were predicted by SignalP v. 3.0 . Details of database search and gene set enrichment analysis are described in the electronic supplementary material, materials and methods.
(d) Whole mount in situ hybridization
DIG-labelled RNA probes were generated from our A. pisum 50 k full-length cDNA collection . Detailed procedures are described in the electronic supplementary material, materials and methods. Clone names used and associated EST accessions are shown in electronic supplementary material, table S1. Bacteriocytes and viviparous ovaries were dissected from L3 to L4 nymphs or young aphids in PBS and then fixed in 4 per cent paraformaldehyde for 30 min. Hybridization, washing and colour development were performed as described previously .
(a) RNA-seq analysis of the bacteriocytes
We performed deep sequencing of mRNA transcripts isolated from bacteriocytes of the pea aphid, A. pisum, and, for comparison, from entire aphid bodies. In total, 22 125 005 and 13 224 022 high-quality reads were generated from bacteriocytes and whole bodies, respectively. The reads were mapped to a custom set of A. pisum transcripts that includes pea aphid official gene set v. 1.0 (OGS1)  and EST sequences that, we found, were overlooked in OGS1 . Out of the total number of reads for each library, 81 and 77 per cent were successfully mapped. In agreement with previous studies [5,6,8], we found that transcripts from genes involved in amino acid metabolism and those derived from lateral transfer from a non-Buchnera bacteria were over-represented in aphid bacteriocytes (see the electronic supplementary material, Dataset S1). We found over-expression of many orphan genes in bacteriocytes, although many orphan genes were expressed in the control sample too (see figure 1 and electronic supplementary material, figure S1A).
The N-terminus of each orphan gene product was analysed with the SignalP program to locate possible signal peptides for secretion. We found that the bacteriocyte sample contained significant enrichment for highly over-expressed transcripts that encode proteins with putative secretory signal peptides (false discovery rate < 0.01; figure 1; electronic supplementary material, figure S1C). For example, of the 30 most strongly expressed genes in bacteriocytes, 10 encode novel proteins with signal peptides.
(b) Aphid-specific putative secreted proteins
We performed a more detailed annotation of the 11 most highly expressed orphan genes with signal sequences (those within the 50 most highly over-represented genes in bacteriocytes assessed by z-score) (table 1). All 11 transcripts were represented in the sequenced cDNA in our A. pisum 50 k full-length cDNA collection . Two transcripts (BCR4 and BCR5) were missing from the official gene set of aphidbase, possibly because they encode relatively short open reading frames (207 and 231 bp respectively). While a BLASTP search of the deduced protein sequences against the arthropod protein database (see §2) and the NCBI's non-redundant protein database returned no significant hits, TBLASTN search against the EST datasets of other aphid species (Myzus persicae, Aphis gosipii and Toxoptera citricida) identified putative homologues for more than half of the genes examined (table 1). This result suggests that the putative secreted proteins are limited to the aphid lineage. Besides the secretion signals, no other Pfam domains were found in these proteins.
The novel putative secreted proteins can be divided into two classes (table 1). One class includes genes that encode small proteins (67–108 amino acids) with six or eight cysteines and we named these bacteriocyte-specific cysteine-rich proteins (BCR family; electronic supplementary material, figure S2A). Four BCR genes (BCR1, BCR2, BCR4 and BCR5) show some similarity with each other and sequence alignment revealed a well-conserved hydrophobic amino-terminal domain, which is predicted to be a signal peptide, followed by cysteine-rich polypeptides. The cysteine-rich region is highly diverged, but the six cysteines have almost identical spacing in the predicted proteins of the four BCR genes (figure 2a). Three of these genes are located in a cluster within 20 kb of each other, suggesting that they may have arisen by tandem gene duplication (figure 2b). No similarity could be found between any of the other BCR family genes (see the electronic supplementary material, figure S2A). Comparison of BCR3 of the pea aphid with the putative homologue from the green peach aphid, M. persicae, showed a perfect conservation of the six cysteine residues and of their spacing (see the electronic supplementary material, figure S2B). The second family of novel proteins, which we called secreted proteins (SP family), contain no cysteines and vary in length from 108 to 413 amino acids (table 1). Over-expression of SP1 (LOC100167607) and SP3 (LOC100164129) in bacteriocytes was reported in the RNA-seq data of Hansen et al. , while other SP and BCR family genes have not been investigated in the previous studies probably because they were not annotated as protein-coding genes in the public database.
(c) Spatio-temporal expression patterns of the novel secreted genes
We examined the spatio-temporal expression patterns for some of the BCR and SP genes. The seven most highly expressed BCR family genes were expressed in indistinguishable patterns (see figure 3 and electronic supplementary material, figure S3). BCR family transcripts were detected first in stage-7 embryos (figure 3a), coincident with the incorporation of Buchnera into blastoderm-stage embryos . At stage 7, strong expression was observed only in the syncytium that develops into bacteriocytes. Expression of all seven genes was maintained strictly in bacteriocyte cells in all later embryonic stages and in adults (figure 3b–d). Three of the six tested SP-family genes were expressed in patterns indistinguishable from the patterns observed for the BCR family genes (see figure 3e and electronic supplementary material, figure S3). These expression patterns suggest that a wide array of recently evolved secretion proteins act within bacteriocytes, perhaps to mediate the symbiosis with Buchnera.
Bacteriocytes are organized into organs called bacteriomes and surrounded by sheath cells . Two of the six SP family genes, SP6 and SP5a, were expressed strictly in sheath cells in adults (figure 3f). While SP5a was expressed in fat body as well as in embryonic stages, SP6 is expressed specifically in presumptive sheath cells in the embryonic bacteriome. These sheath cell expression patterns of SP family genes suggest that novel secretion proteins may also act to help organize the bacteriome (see the electronic supplementary material, figure S4).
In this study, we report the identification and characterization of novel families of putative secreted proteins restricted to aphid species. Lineage-specific orphan genes, which have no detectable homologues in other organisms, make up 10–20% of the genes in every animal genome sequenced to date. Although orphan genes have not been investigated intensively, they are drawing increasing attention because lineage-specific genes can account for evolution of lineage-specific traits [17,18]. This view is supported by our findings. BCR and SP genes, which show no significant sequence similarity to genes in species outside the aphid lineage, are expressed abundantly in the cells of the aphid bacteriome, where endosymbionts live. These expression patterns suggest that these orphan genes act within the bacteriome, perhaps to mediate the symbiosis with Buchnera.
Previous transcriptomic studies have already led to the identification of genes preferentially expressed in bacteriocytes. However, the tissue specificity and the time course of the expression of these genes have not been examined previously. In this study, whole mount in situ hybridization revealed that 7 BCRs and 4 SP transcripts are expressed in bacteriocytes throughout the life of the aphid. In addition, we found two transcripts encoding putative secreted protein products whose expression is restricted to sheath cells. Sheath cells form a thin casing around bacteriocytes [16,19] and, in aphids, secondary symbionts sometimes reside in sheath cells . Sheath cells are observed in aphids, psyllids and many other insects , but little is known about the function or evolutionary origins of sheath cells. In future studies of bacteriome development and evolution, the BCR and SP genes will likely serve as valuable specific molecular markers of bacteriocytes and sheath cells.
The presence of N-terminal signal sequences in BCRs and SPs indicates that these novel proteins enter the secretory pathway. The proteins synthesized in the endoplasmic reticulum may be transported, via Golgi, to the extracellular space. Another possibility is that these proteins are targeted to Buchnera within the bacteriocytes. This is a probable scenario given that Buchnera cells are housed within vesicles constructed from host-derived membranes called symbiosomal membranes, and electron microscopic observations have shown that symbiosomal membranes are sometimes closely associated with the bacteriocyte endoplasmic reticulum [16,21]. We hypothesize that these ‘secreted’ proteins may be released into these vesicles. Recent studies in weevils and plants provide an interesting comparison with our findings in aphids. A weevil coleoptericin-A (ColA) antimicrobial peptide (AMP), which has 23-amino acid signal peptides, selectively targets endosymbionts within bacteriocytes . In the Rhizobium-legume symbiosis, nodule-specific secreted peptides called NCRs are targeted to the bacteria . In both cases, the secreted peptides regulate the proliferation of the symbionts. By analogy with these symbiotic systems, it is possible that some of the BCRs and SPs of aphids may mediate the symbiosis by direct association with the symbiont to regulate cell division and metabolic activity. The most probable target of the novel secreted peptides is Buchnera, but we cannot exclude the possibility that they interact with secondary symbionts and microbial intruders.
The BCR family genes may have evolved from defensin-type AMPs, small proteins that contain six to eight cysteines, which are common in both animals and plants. AMP-like secretory cysteine-rich proteins have evolved in various taxa and are involved in diverse cell–cell communication functions, including symbiosis . Notably, the evolution of BCR genes in aphids mirrors the evolution of cysteine-rich secreted proteins in leguminous plants. The Medicago truncula genome contains a family of novel genes encoding AMP-like cysteine-rich proteins with secretion signals [25,26]. More than 300 genes of this family are expressed specifically in root nodules, the symbiotic plant organs that house nitrogen-fixing endosymbiotic bacteria, and at least some of these proteins target bacteria and induce differentiation . The remarkable convergence of protein sequence composition and expression pattern for these genes between leguminous plants and aphids suggests that there may be common principles underlying evolution of endosymbiosis in divergent taxa.
We thank K. Sato and T. Hashiyama for technical support, and The Genomics Core Facility at Princeton University and Genomics Resource Center at Rockefeller University for Illumina sequencing. This work was supported in part by KAKENHI Grant-in-Aid for Young Scientists A to S.S.
- Received August 21, 2012.
- Accepted October 18, 2012.
© 2012 The Authors. Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/3.0/, which permits unrestricted use, provided the original author and source are credited.