Conserved properties of Drosophila and human spermatozoal mRNA repertoires

Bettina E. Fischer, Elizabeth Wasbrough, Lisa A. Meadows, Owen Randlet, Steve Dorus, Timothy L. Karr, Steven Russell


It is now well established that mature mammalian spermatozoa carry a population of mRNA molecules, at least some of which are transferred to the oocyte at fertilization, however, their function remains largely unclear. To shed light on the evolutionary conservation of this feature of sperm biology, we analysed highly purified populations of mature sperm from the fruitfly, Drosophila melanogaster. As with mammalian sperm, we found a consistently enriched population of mRNA molecules that are unlikely to be derived from contaminating somatic cells or immature sperm. Using tagged transcripts for three of the spermatozoal mRNAs, we demonstrate that they are transferred to the oocyte at fertilization and can be detected before, and at least until, the onset of zygotic gene expression. We find a remarkable conservation in the functional annotations associated with fly and human spermatozoal mRNAs, in particular, a highly significant enrichment for transcripts encoding ribosomal proteins (RPs). The substantial functional coherence of spermatozoal transcripts in humans and the fly opens the possibility of using the power of Drosophila genetics to address the function of this enigmatic class of molecules in sperm and in the oocyte following fertilization.

1. Introduction

Spermiogenesis, the production of a differentiated male gamete, is a remarkable example of cellular differentiation, yet despite the critical role the male gamete plays in the life cycle we still know relatively little about its full range of functions. In the past, it was widely believed that the only function of the spermatozoa was to deliver the male genome to the egg. However, it has been known for over 50 years that fertilization results in a complete fusion of the entire sperm and egg cells, and more recent genetic and functional studies have identified essential sperm cell components and molecules delivered to the zygote at fertilization [13]. These include a functional centrosome [4], an activation stimulus [5,6] and other paternal products [7,8]. In Drosophila, detailed molecular genetic studies have identified paternal gene products absolutely essential for successful fertilization [912] further lending credence to the idea that sperm-derived factors provide important functionality during, and perhaps following, fertilization and zygote formation.

Furthermore, a number of studies suggested that mature mammalian spermatozoa contain a variety of RNA species [1315]. Several studies strengthened these observations and the presence of poly-adenylated messenger RNAs in spermatozoa [16,17] is now widely accepted. Work with both human and mouse demonstrated the delivery of some of these sperm transcripts to the oocyte [18,19]. Furthermore, comparisons between sperm from fertile and infertile males indicate that sperm transcripts may have diagnostic value and suggests a relationship between sperm transcripts and proper sperm function [2022].

During normal spermatogenesis a large number of transcripts are produced, which encode the myriad of proteins, many sperm-specific, that are needed for spermiogenesis. Transcripts are often stored in the spermatocyte or spermatid cytoplasm for long periods before translation [23]. A critical aspect of spermiogenesis in mammals and Drosophila is the change in chromatin structure resulting from the replacement of somatic histones by sperm-specific protamines, which leads to a greater level of DNA compaction. Nuclear transcription shuts down during this process and as the sperm further mature there is a major loss of cytoplasm [2426]. One explanation for the mRNAs found in mature sperm is that they represent remnants of the spermatogenesis programme that are left behind during sperm maturation. It was widely believed that sperm were translationally silent, however, it has recently been shown that labelled amino acids are incorporated into polypeptides during mammalian sperm capacitation, a process that occurs in the female reproductive tract [27]. Sperm translation is mediated by mitochondrial-type ribosomes and it is thus possible that sperm transcripts are substrates for this sperm protein production. There is mounting evidence that spermatozoal RNA is delivered into the oocyte and remains intact after fertilization. At least five sperm-specific mRNAs that are not detected in unfertilized oocytes are found post-fertilization [19]. In a recent study introducing human sperm into hamster oocytes by intracytoplasmic sperm injection, two human-specific transcripts with known roles in implantation (PSGI and HLA-E) that are not present in the hamster genome are detected 24 h after fertilization, suggesting that these paternal transcripts survive in the oocyte [28]. Together these studies point to functional roles for spermatozoal mRNAs in the oocyte.

The composition and quantity of sperm RNA is now considered to be a valuable diagnostic tool for male fertility. In individual human ejaculates, 3000–7000 different transcripts were detected in one microarray study [29] and 4000–5000 mRNA types were observed by serial analysis of gene expression in pooled sperm fractions from different ejaculates [15]. However, owing to the much lower concentration of spermatogenetic RNA compared with the maternal mRNA in the oocyte it has been assumed that sperm transcripts do not play a major role in fertilization and early embryogenesis. In support of a functional role for transfer of sperm RNA in development, a heritable paramutation-associated white tail phenotype was induced in mice by microinjection of total RNA from Kittm1Alf/+ heterozygotes into fertilized eggs [30]. Taken together, the evidence is mounting for the presence of functional poly-adenylated mRNAs in the mature sperm of some mammalian species but it is unclear whether this is a more universal feature of animal sperm.

Less is known about sperm RNA profiles in non-mammalian species, but recently de novo RNA transcription in Drosophila has been demonstrated in post-meiotic phases of spermatogenesis [31,32] suggesting mature Drosophila sperm may also provide mRNA to the egg during fertilization. Drosophila has proved to be a valuable model system for exploring conserved aspects of animal biology and, in the case of sperm biology, it is becoming clear that there are molecular and cell biological features conserved between mammals and flies [33,34]. Here we demonstrate that, as in mammals, poly-adenylated mRNA transcripts are present in mature Drosophila sperm cells and that paternal sperm transcripts are detected in the fertilized egg. The conservation in the functional annotations associated with sperm mRNAs found in the two species suggests that Drosophila may be an attractive alternative model system to explore the function of human sperm transcripts during and following fertilization.

2. Results

(a) Spermatozoal mRNA is reproducibly detected in purified Drosophila melanogaster sperm.

To explore the possibility that mature Drosophila sperm contain mRNA transcripts, we used DNA microarray analysis to assess the RNA content of isolated sperm samples. Highly purified sperm samples were obtained from dissected seminal vesicles, which are almost entirely composed of mature sperm, using methods described previously [35,36]. We also demonstrated that following removal of sperm, seminal vesicles contain very low levels of detectable soluble proteins by protein quantitation and two-dimensional gel electrophoresis (K. Chaney & T.L.K., 2010, unpublished data).

The main experiment used RNA extracted from three biological replicates of purified sperm as a template for oligo-dT-primed reverse transcription, amplification, labelling of dye-swapped technical replicates and hybridization to long oligonucleotides microarrays (see electronic supplementary material, supplementary methods). As a control, RNA from two biological replicates of dissected adult testis plus accessory glands was amplified, labelled (dye-swap technical replicate) and hybridized to similar arrays. After normalization and processing to remove low intensity values, we identified 5579 transcripts present in all three sperm replicates, 5358 transcripts from the testis/accessory gland samples and 4295 transcripts common to both datasets. To assess the reliability and reproducibility of the data, we calculated a Pearson correlation between sample pairs (electronic supplementary material, figure S1). For the purified sperm samples, technical replicates were highly reproducible (r = 0.97–0.99) as were biological replicates (r = 0.72–0.90). Testis/accessory gland samples were similarly highly correlated (r = 0.94–0.97 for technical replicate and r = 0.90–0.93 for biological replicates).

To determine if spermatozoal transcripts result from incidental capture of abundant testis transcripts or contamination from the accessory gland, we ranked normalized intensity values by the highest median value for both purified sperm and the whole tissue transcriptomes (table 1). If the spermatozoal transcripts result from contamination with seminal vesicle tissues, we would expect the most abundant testis/accessory gland transcripts to be high in the list of sperm transcripts. Similarly, if packaging of spermatozoal transcripts were purely a passive process, we would expect a correlation between the abundance of testis/accessory gland transcripts and those in sperm. Contrary to this expectation, the rankings are entirely distinct with the exception of two transcripts (CG31226 and CG10407), both encoding genes of unknown function. As expected, the ranking of the testis/accessory gland sample reveals that male-specific transcripts and accessory gland protein transcripts are particularly abundant. Their absence from the sperm ranking indicates that the purified sperm sample was not contaminated with accessory gland nor is the sperm transcript pool the result of passive transcript packaging.

View this table:
Table 1.

Top ranked spermatozoal and testis/accessory gland mRNAs. On the left, the top 40 sperm transcripts based on normalized median intensity and their corresponding rank from the testis/accessory gland array (ribosomal proteins are highlighted in bold). On the right, the top ranked transcripts from the testis/accessory gland sample and corresponding rank in the sperm transcript list (Acps are highlighted in bold).

(b) Spermatozoal RNA and the sperm proteome

To further address the possible persistence of residual transcripts that encode integral sperm components we compared the list of abundant spermatozoal transcripts to genes encoding the sperm proteome [35,37]. A survey of the 20 most abundant spermatozoal RNA transcripts reveals only one gene that encodes an integral sperm component. A more comprehensive analysis of the 200 and 500 most abundant sperm transcripts revealed that only 21 per cent (42 out of 200) and 18.4 per cent (92 out of 500) encode components of the sperm proteome. This observation is statistically indistinguishable from the results of an identical analysis of the most abundant testis transcripts, where 22.5 per cent (45 out of 200) and 19.6 per cent (98 out of 500) were found to encode sperm proteins. While this does not rule out the possible persistence of sperm proteome transcripts in mature sperm, it confirms that it does not contribute substantially to the identified abundant spermatozoal transcripts, nor does it exceed the overall proportion of abundant testis transcripts that encode sperm proteins. It is also noteworthy that among the 42 most abundant spermatozoal RNA transcripts that also encode sperm proteins, a significant enrichment (17 of 42; p = 4.99E − 11) are functionally involved in translation based on gene ontology (GO) annotation, including six components of the ribosomal large subunit, eight components of the small subunit and three elongation factors (see later).

(c) Expression of spermatozoal RNA

We next analysed the 200 most abundant transcripts from each of the sperm and testis/accessory gland sample sets using the FlyMine data warehouse [38]. We found 34 genes (17%) in common between both sperm and testis in the top 200 selection, further emphasizing the difference in the two populations of mRNA. When we consider the global properties of the lists, we again observe differences. Comparing the expression patterns of both datasets to the tissue-specific expression catalogue in the FlyAtlas [39], we find that sperm transcript genes are expressed in most tissues and, strikingly, only 20 per cent of the genes have high levels of testis expression, whereas 75 per cent have very low levels of testis expression. In contrast, with testis/accessory gland RNA, we find that 70 per cent of genes are highly expressed and 20 per cent have low expression levels in the FlyAtlas testis sample (electronic supplementary material, figure S2). We also noticed that 65 per cent of the genes in the sperm transcript list that have BDGP expression annotations (113 genes) are expressed in the early (stage 1–3) embryo, whereas only 25 per cent of the testis/accessory gland enriched genes have expression this early in development [40,41] (electronic supplementary material, figure S2). We conclude that the spermatozoal transcriptome does not include a substantial proportion of testis enriched spermatogenesis genes.

(d) Genomic distribution of spermatozoal RNA

As is well established for testis-specific genes, an analysis of the 200 most abundant testis/accessory gland transcripts reveals a significant under-representation of X-linked genes (32% of the expected value; χ2 = 15.81, p < 0.0001). In contrast, the abundant spermatozoal RNA genes show a slight, but non-significant under-representation of X-linked genes (74% of the expectation; χ2 = 2.35, p = 0.13). Additional analyses revealed significant co-localization of abundant testis transcripts within adjacent gene clusters (total clusters = 10; p < 0.01) and that clustering was restricted to the autosomes. This observation is consistent with previous studies of Drosophila sperm [35,37] and testis-overexpressed genes in Drosophila and the mouse [4244]. However, analysis of spermatozoal RNA revealed no significant clustering on either the autosomes or X chromosome (p = 0.22), a finding which further reinforces the inherent differences in the properties of these two sets of transcripts.

(e) Functional enrichment of spermatozoal RNA

To assess the possible functions of sperm transcripts and highlight fundamental distinctions between these and the testis/accessory gland transcripts, we analysed molecular function and biological process GO on the most abundant 200 transcripts in both ranked lists. Strikingly, 33 per cent of spermatozoal RNA transcripts with GO molecular function annotations (47/142) encode components of the ribosome (p = 5E10–45). Additionally, 12 genes were identified with transmembrane transporter activity (p = 2E10−8) and five with translation elongation factor activity (p = 0.019). In contrast, highly expressed genes in the testis/accessory gland transcriptome are enriched solely for hormone activity (n = 8; p = 1E−4). When biological process ontology was analysed, the enriched categories were again found to be distinct between the groups. The largest sets of genes within enriched categories for abundant sperm transcripts include mitotic spindle organization (n = 30; p = 3E-16) and translation (n = 54; p = 3E−16). In contrast, testis/accessory gland transcripts are exclusively found within the reproduction category and 14 related daughter ontology categories (n = 29; p = 4E−5). Taken together, the global expression, genomic distribution and functional properties emphasize that the set of genes encoding sperm transcripts have distinct characteristics from the abundantly expressed genes in the testis and accessory gland.

(f) Spermatozoal RNA is transferred to the oocyte during fertilization

To determine if sperm transcripts are delivered into the oocyte, we used a collection of protein trap fly lines we have recently generated. The FlyProt project has created a set of fly lines in which endogenous genes are tagged with an in-frame artificial reporter exon encoding a yellow fluorescent protein (YFP) ([45,46]; We identified three lines in our collection that carry protein traps in sperm transcripts present in our top 40 list: RpS9 (CPTI-000493), RpL41 (CPTI-002881) and CG9336 (CPTI-001654). Male flies heterozygous for each of these YFP-tagged genes were mated with wild-type females and embryos collected at three different time points after egg laying (T1: 0–15 min, T2: 60–90 min and T3: 180–210 min). We used reverse transcriptase- (RT-) PCR assays with RNA extracted from single embryos, employing paternal-specific primers that will only amplify the copy of the gene tagged with the YFP insertion. In order to control for detection of tagged transcripts that result from zygotic expression of the tagged gene from the paternal genome, the age of each embryo was determined using RT-PCR assays on the same RNA samples for genes known to be transcribed at different stages of early embryo development. Sisterless A (sisA) and snail (sna) are the earliest known zygotically transcribed genes in D. melanogaster, with expression first detected at nuclear cycle 8. This is followed by even skipped (eve) at nuclear cycle 9 [4749]. Fasciclin-3 (Fas3) expression is initiated much later in development at stage 11–12 [50]. Bicoid (bcd) is a maternally deposited transcript that starts to degrade during the maternal-to-zygotic transition [51]. Rp49 was used as a ubiquitous control.

Detection of tagged sperm transcripts in an embryo along with bcd and the Rp49 controls, but in the absence of signals for the other transcripts, indicates that the sperm transcripts are either delivered paternally or represent zygotic expression considerably earlier than previous studies have found. We found 50–70% of T1 embryos were older than expected since they gave positive PCR results for one or more of the zygotically expressed genes (figure 1; electronic supplementary material, table S1). As expected, levels of paternal contributed mRNA in single embryos are extremely low, but we could detect the presence of the YFP-tagged mRNA for all three of the sperm transcripts assayed in approximately 30 per cent of the embryos confirmed not to have initiated zygotic expression. The detection of the YFP-tagged transcript increased with the age of the embryos but this is due to the onset of zygotic expression and transcription of the paternal allele. We can eliminate the possibility of genomic DNA contamination because the primers used to amplify the tagged paternal transcripts span an intron and the product we observe is of the size expected for a processed mRNA. We consider it extremely unlikely that the transcripts we detect in the early embryos are a result of precocious activation of the zygotic genome. In particular, each of the three independent paternal protein trap transcripts assayed contain an intron of at least 10 kb resulting from the insertion of the protein trap transposon: it is extremely unlikely that these genes could be transcribed and processed during the extremely short cell cycle times of the early embryo. We therefore conclude that in Drosophila, as in mammals, transcripts are specifically packaged into mature sperm, delivered to the oocyte and can be detected in the zygote.

Figure 1.

RT-PCR using staged embryos. Embryos from male-YFP x female Oregon-R were collected after 15 min laying (T1), after 30 min of laying plus 1 h ageing (T2), and after 30 min laying plus 3 h ageing (T3). The age of the embryos was determined using bcd, sisA, sna, eve and Fas3 to ensure embryos of T1 were collected before the onset of zygotic transcription. Rp49 was used to confirm presence of cDNA after RNA extraction and reverse transcription. NTC (no template control) and positive control for each gene product are shown. (a) CPTI001654 (CG9336-YFP), (b) CPTI002881 (RpL41-YFP) and (c) CPTI000493 (RpS9-YFP).

(g) Functional conservation of spermatozoal RNA between insects and humans

To assess parallels between insect and human spermatozoal RNA, we first characterized the relationship between abundant human spermatozoal RNA and both human testis mRNA expression and the human sperm proteome. Similar to our observations with Drosophila, human spermatozoal transcripts are, for the most part, not highly expressed in the testis, with only 36 of the most abundant 500 spermatozoal transcripts identified within the 10 per cent of probes displaying the highest levels of average testis expression [52] (electronic supplementary material, table S2). We next assessed the presence of proteins encoded by the same set of abundant spermatozoal transcripts and found that only 19.2 per cent (96 out of 500) have been identified in proteomic analysis of whole human sperm [53] or the human sperm nucleus [54]. This proportion is statistically indistinguishable from our observations in Drosophila2 = 0.059, p = 0.81), although we note that a direct comparison does not account for potential differences in the extent of sperm proteome characterization between the taxa. We therefore conclude that spermatozoal transcripts in both humans and the fly are largely distinct from genes expressed at high levels during spermatogenesis and genes encoding the sperm proteome.

If spermatozoal RNAs function in the oocyte following fertilization, evolutionary conservation across taxa in RNA composition might be expected. Consistent with this predication, an analysis of the 500 most abundant transcripts in Drosophila to those previously identified in human spermatozoa [55] revealed that their functional composition is generally shared across 10 broadly defined GO functional categories (electronic supplementary material, figure S3). A more detailed analysis of molecular function GO enrichment in each species (relative to the composition of their respective genomes) reveals a highly significant and shared enrichment in structural constituents of the ribosome, including translation elongation and termination factors (table 2). Remarkably, these sets of spermatozoal RNA included 32.7 per cent (60 of 183) and 31 per cent (50 of 161) of the annotated structural constituents of the ribosome in the Drosophila and human genome, respectively. It is noteworthy that other than the molecular functional categories associated with translation, including structural molecule activity which largely comprises ribosomal components, there are no shared functional categories between the two datasets.

View this table:
Table 2.

Molecular function enrichment for abundant Drosophila and human spermatozoal RNAs. Includes the 30 most significant molecular function categories in Drosophila and all significant categories in humans. Molecular functions specifically associated with translation are indicated in bold and are italics if they are identified in both Drosophila and humans.

3. Discussion

In this report, we describe the characterization of an mRNA population carried by mature Drosophila melanogaster sperm cells, the first description of an invertebrate spermatozoal transcriptome. We further demonstrate that at least some of the spermatozoal transcripts are successfully delivered to the fertilized egg and can be detected prior to the onset of zygotic gene expression. Thus, in flies and vertebrates the mature sperm carries a defined set of mRNA transcripts into the egg at fertilization. A parallel analysis of testis and accessory gland mRNA indicates that sperm mRNAs are unlikely to result from residual waste remaining after sperm individualization, since the relative abundances of spermatozoal mRNAs are not reflective of transcript abundance in the reproductive tract. An analysis of published data from human sperm and testis indicate this property of spermatozoal RNA is conserved. We therefore suggest that particular mRNAs are either differentially located in developing sperm cells, such that they are retained during individualization, or that they are selectively sequestered to prevent loss. Our current study does not differentiate between these possibilities. High resolution mRNA in situ hybridization should allow this issue to be addressed using probes against the sperm transcripts identified in this study.

We found that the sperm mRNA population was particularly enriched for transcripts encoding ribosomal components and other proteins related to translation and that a similar enrichment is found for mRNAs carried by mature human sperm. This remarkable functional conservation hints at an underlying biological function for the spermatozoal transcriptome. Although sperm enter an environment rich in maternal mRNAs encoding the necessary components for constructing ribosomes, our observations raise the possibility of specific role(s) for ‘paternal’ ribosomes during and following fertilization. Another possibility is that RPs or their transcripts play another role in early development as has been previously shown for some RPs [5658]. Intriguingly, RPS3, which is in the top 300 spermatozoal transcripts, has been shown to play a role in DNA repair and the possibility of such a function in pronuclear fusion is intriguing.

It has been suggested that spermatozoal transcripts may play a role in nuclear compaction, in some way marking regions of the paternal genome that do not undergo the histone to protamine replacement [20]. It is estimated that approximately 15 per cent of the DNA in human spermatozoa retain histones [59] and it is suggested that these regions of the genome may be regulatory or be important for epigenetic marking [60]. A histone to protamine-like protein transition has recently been described in Drosophila [26], however, the extent of the transition and whether regions of the genome remain associated with histones is not currently known in flies. Similarly, although epigenetic modification of parental chromosomes is not normally considered a feature of early Drosophila development, recent work indicates that the protein encoded by the paternal effect locus ms(3)k81 is required to protect paternal telomeres at fertilization [61]. Thus, there is a possibility that spermatozoal mRNA plays a role in marking the paternal genome that has hitherto been masked by the essential nature of the genes encoding the spermatozoal mRNAs. Whatever their role, the discovery of Drosophila spermatozoal mRNAs that encode a set of molecular functions conserved with those found in mammalian sperm opens the prospect of using the tools of fly genetics and developmental biology to explore the contribution these transcripts make to reproductive biology.

4. Methods

The experimental methods are summarized here with full details provided in the electronic supplementary material, supplementary methods.

(a) Tissue collection and RNA extraction for microarrays

Sperm were purified from adult D. melanogaster males essentially as described [35,36], and RNA extracted using the trizol method. For the purified sperm samples, RNA equivalent to the dissections from approximately 200 males were pooled to create three independent biological replicates. Testis/accessory gland samples were similarly processed.

(b) Microarray analysis

RNA samples were amplified using SMART method [62] and the resulting DNA labelled as technical dye-swap replicates for hybridization to long oligonucleotide microarrays printed in house (GEO platform accession GPL8244). Sperm and testis samples were cohybridized with labelled genomic DNA to aid spotfinding and the latter channel discarded for further analysis. After hybridization, data were normalized independently using a quantile method [63]. The median of the normalized intensity of each sample type was ranked using minimum ties method of the rank function in R, assigning the highest intensity value a rank of 1 (table 1).

(c) Data analysis

The chromosomal distribution of spermatozoal RNA genes was compared statistically with the distribution of annotated genes using a χ2-test with Yates correction. Analysis of gene clustering was conducted using a modified adjacent gene model [42,43]. Statistical analysis of GO molecular function enrichment of the 500 most abundant Drosophila and human sperm transcripts was conducted using a hypergeometric distribution and the Yekutieli (false discovery rate under dependency) multiple-test correction as implemented by the GOEAST toolkit [64].

(d) Drosophila stocks and embryo collections for RT-PCR confirmation

Drosophila stocks were: Oregon-R (OrR), CPTI000493 (RpS9-YFP), CPTI002881 (RpL41-YFP), CPTI001654 (CG9336-YFP), which contain a Venus (a YFP variant) exon insertion within an intron (; [45,46]).

(e) Reverse transcription and PCR reactions for RT-PCR confirmation

RNA was extracted from single embryos using the Trizol method (Invitrogen) and each sample analysed with the set of primer pairs described in the electronic supplementary material, supplementary methods.


This work was supported by BBSRC and Isaac Newton Trust Grants to S.R., a Ruth L. Kirschstein National Research Service Award (National Institutes of Health) and an Academic Research Fellowship from the Research Council of the United Kingdom to S.D., a Royal Society Wolfson Merit Award, the BBSRC and the Biodesign Institute at Arizona State University to T.L.K. We would also like to thank Elaine Wilkin for her assistance analysing gene clusters.

  • Received January 22, 2012.
  • Accepted February 10, 2012.


View Abstract