Immunity is mostly studied in a few model organisms, leaving the majority of immune systems on the planet unexplored. To characterize the immune systems of non-model organisms alternative approaches are required. Viruses manipulate host cell biology through the expression of proteins that modulate the immune response. We hypothesized that metagenomic sequencing of viral communities would be useful to identify both known and unknown host immune proteins. To test this hypothesis, a mock human virome was generated and compared to the human proteome using tBLASTn, resulting in 36 proteins known to be involved in immunity. This same pipeline was then applied to reef-building coral, a non-model organism that currently lacks traditional molecular tools like transgenic animals, gene-editing capabilities, and in vitro cell cultures. Viromes isolated from corals and compared with the predicted coral proteome resulted in 2503 coral proteins, including many proteins involved with pathogen sensing and apoptosis. There were also 159 coral proteins predicted to be involved with coral immunity but currently lacking any functional annotation. The pipeline described here provides a novel method to rapidly predict host immune components that can be applied to virtually any system with the potential to discover novel immune proteins.
Comparative immunology uses a variety of invertebrate and vertebrate species to better understand the evolution and origin of immunity . These model organisms have provided fundamental insights into evolutionarily conserved immune mechanisms. For example, the initial concept of self versus non-self recognition was formed by experimentation with starfish, and studies in chickens led to the identification of separate B and T cell lineages [2,3]. While these discoveries have been invaluable in understanding immune mechanisms, the majority of comparative immunology is based on data from only 3 of the 30 extant animal phyla . A broader representation of phyla will help to fully appreciate the complexity of immunity.
Currently, the establishment of molecular tools is infeasible in terms of time and money for most of the metazoan tree of life. For example, continuous cell lines have been a long-term goal of coral reef biologists for decades. While there have been reports of primary cell cultures, standardized cell lines have yet to be established and genetic manipulation of corals remains elusive [4–7]. To investigate the immune systems of corals, as well as the remaining twenty-seven other animal phyla, a broad and rapid approach is required . Here, we propose a comparative genomic approach in which host viral communities are used to predict host immune proteins.
Viruses are dependent on a host cell to complete their life cycle. Therefore, they must replicate and package their genomes intracellularly while avoiding the host immune response. To establish and maintain control of host cells, viruses mimic host proteins [8–11]. For example, herpes viruses encode proteins that possess sequence similarity to human cytokines and can modulate cytokine signalling in host cells during infection . If a herpesvirus-encoded cytokine was compared with the human proteome using BLAST, then the human-encoded cytokine would be identified without any a priori knowledge of the human immune system. Expanding this observation, we hypothesized that an in silico comparison of all viral gene products to the predicted host proteome would identify both known and unknown immune-associated proteins.
As a proof of principle, in silico constructed viromes were used to successfully identify known human immune proteins. This method was then applied to reef-building corals to identify proteins potentially involved in coral immunity. Most of these proteins were homologues to human proteins, but a subset of 159 coral proteins were predicted to be involved in novel coral immune functions. This method generates putative host immune proteins to be further tested with the goal of discovering new types of immunology.
(a) Known viruses versus the well-characterized human immune system
As a test, the proposed method was applied to viruses known to manipulate the human immune system (figure 1). A mock virome was created by fragmenting 16 fully sequenced viral genomes into 200 base pair segments. The resulting in silico virome consisted of 8542 DNA segments (figure 1; electronic supplementary material, table S1a). This mock human virome was then compared with the human proteome using tBLASTn, with an e-value of 1 × 10−4 as a cutoff for significance yielding 36 human proteins matching viral DNA segments (electronic supplementary material, file S1). The function of these proteins includes cell cycle control, cytokines, cytokine receptors, chemokines, chemokine receptors, apoptosis and complement activation (figure 2; electronic supplementary material, table S1b).
(b) Coral derived viromes versus predicted coral proteomes
Fourteen viromes were generated from four species of coral; Porites rus (four colonies), Acropora sp. (three colonies), A. yongeii (three colonies) and Pocillopora verrucosa (four colonies) . Sequence data quality control was performed using PRINSEQ and DeconSeq, resulting in 1 048 627 good-quality sequences with an average size of 315 base pairs (electronic supplementary material, table S2) [14,15]. For this analysis, the viromes were combined in silico into one coral virome and compared with the predicted proteome of A. digitifera using tBLASTn. An e-value of less than 1 × 10−4 was used as the cutoff for significance  (see electronic supplementary material, file S2, for full tBLASTn results). This resulted in a total of 60 191 virome-derived DNA sequences that significantly matching 5 863 predicted coral protein sequences. To reduce the complexity of this dataset, only coral proteins that matched to at least five of the virome fragments were considered for further analyses (2 503 proteins total; figure 3; electronic supplementary material, table S3) . The potential functions of this group of proteins was predicted by comparing to the human proteome using BLASTp (electronic supplementary material, file S3), as well as a functional domain analysis using the conserved domain database (CDD) (electronic supplementary material, file S4) . This showed that 83 proteins related to PAMP sensing and apoptosis (figure 4; electronic supplementary material, figure S1 and table S4a) . Electronic supplementary material, table S5, provides descriptions of protein domains classified as PAMP sensing [19–23] and apoptosis [24–30]. Compared with the remainder of the coral proteome, this group of host proteins with homology to viral proteins was significantly enriched for PAMP sensing and apoptosis-associated domains (figure 4b; electronic supplementary material, table S4b; paired two-tailed t-test p-values ≤ 0.0001 and 0.001, respectively).
(c) Bioinformatic prediction of novel immune proteins
The pipeline described in figure 3 resulted in 159 coral proteins that could not be annotated using either the human proteome or the CDD. To determine the predicted cellular localization (i.e. membrane bound or cytoplasmic) transmembrane regions were predicted using the TMHMM server resulting in 25 proteins possessing 1–14 transmembrane regions (electronic supplementary material, figure S2 and file S5) . Based on the pipeline used in this study a general experimental approach is proposed to identify novel immune components of non-model organisms (figure 5). Viral nucleic acid is isolated from host tissue and compared with the host transcriptome to identify protein candidates that are predicted targets of viral proteins and therefore predicted to be involved with host immunity. This candidate group of immune-associated proteins is then compared with a protein domain database and the proteome of an established model organism to remove proteins whose function can already be predicted. Proteins that lack similarity to either database represent the final group of candidate immune genes with completely unknown function that can be further explored using in vitro and in vivo experimentation (figure 5).
The pipeline described here combines domain-based protein annotations with novel annotations generated by viral communities to predict the host immune repertoire. In animals domain-based methods are biased towards the three major phyla investigated thus far (i.e. Chordata, Nematoda and Arthropoda); therefore, we cannot exclusively rely on those strategies to elucidate the immune systems of uncharacterized phyla . Viral communities provide a domain-independent approach to predict host immune proteins that supplements existing domain-based protein annotation methods.
(b) Viral manipulation of apoptosis and intracellular pathogen sensing
The domain-based annotation of the coral proteome indicates the coral immune system is highly complex and in some instances more complex than humans [16,32,33]. As expected, many of the components predicted to be involved with coral immunity based on the presence of conserved domains are also predicted targets of the viral community. With the rise of metagenomics, it has become clear that many viruses are persistent and do not cause any known pathologies. For example, the virome of apparently healthy humans includes members of Herpesviridae, Polymovaviridae, Papillomavirade, Adenoviridae, Anelloviridae and Parvoviridae [34–37]. To maintain viral infection over time, persistent viruses will often express multiple proteins that possess sequence similarity to host proteins involved with programmed cell death (apoptosis) and PAMP sensing [9,38]. Figure 4 and electronic supplementary material, figure S1 suggest that, similar to humans, the viral communities associated with apparently healthy coral interact with apoptosis and PAMP sensing. Specifically, the tumour necrosis factor (TNF) signalling pathway and nod-like receptors (NLRs) are predicted targets of coral-associated viruses.
The TNF signalling pathway acts as a central mediator of apoptosis and appears to be functionally conserved from corals to humans [33,39]. TNF-receptor-associated factors (TRAFs) are critical adaptor proteins that bind to the intracellular portion of TNF receptors regulating cell survival and cytokine production . Based on the total number of viral sequences matching to coral proteins homologues of TRAF6, TRAF4 and multiple TNFs were in the top 10% of all host proteins, suggesting that the TNF-signalling pathway is a major target of the coral virome (figure 4; electronic supplementary material, figure S1). In humans, viruses belonging to Herpesviridae, Adenoviridae, Reoviridae and Retroviridae have been shown to interact with TNF-mediated apoptosis . Metagenomic evidence in corals indicates that similar to humans, the abundance of herpes viruses changes in response to physiological stress and is associated with bleaching events [41–45]. Future work should focus on which families of viruses interact with the coral TNF-signalling pathway and whether related viral manipulation strategies exist in corals and humans.
NLRs are intracellular signalling molecules that play a central role in the detection of PAMPs . The genome of the reef-building coral A. digitifera encodes 500 predicted NLRs compared with only 22 found within the human repertoire . In total, 398 viral hits were distributed across 25 NLRs including many containing a glycosyl transferase domains (electronic supplementary material, figure S1). Corals are the first metazoans described thus far to contain the NLR-glycosyl transferase domain combination; however, the function any coral NLR has yet to be determined. The pipeline described here provides a starting point in the selection of specific NLRs that warrant additional investigation. Taken together, domain-based protein annotation provides a predicted framework of the coral immune system and, based on those annotations, coral-associated viruses are modulating pathogen sensing and the host apoptotic response.
(c) Beyond conserved domains-novel predictions from viral communities
Domain-based annotation allows for the rapid prediction of host immune proteins. However, the databases used to generate those annotations are largely based on experimental evidence from only three animal phyla (i.e. Chordata, Nematoda and Arthropoda) . While many protein domains are expected to be conserved across phyla, relying exclusively on domain-based annotations will fail to identify immune proteins that lack previously characterized domains. The pipeline described here identified 159 predicted coral immune proteins that could not be annotated using the CDD and lack homology to any human protein. In addition to missing completely novel immune domains, comparing the host proteome to the CDD will also fail to identify proteins involved with immunity that lack canonical immune domains (electronic supplementary material, table S5). In total, 275 coral proteins could be annotated with the CDD but did not contain any PAMP sensing or apoptotic domains or possess any similarity to human proteins (electronic supplementary material, file S5). This group of proteins, as well as the 159 proteins that do not match any domain in the CDD, may be involved in coral-specific immunological processes. The pipeline described here provides a more comprehensive prediction of the host immune repertoire by combining the power of existing databases with new predictions generated from resident viral communities.
(d) Caveats and future work
One limitation of the proposed bioinformatic pipeline is its reliance upon amino acid alignments long enough to produce a significant e-value using tBLASTn. For example, some viruses hijack cell regulation using short elements termed eukaryotic linear motifs (ELMs) that are only two to eight residues in length [47,48]. Therefore, the short alignments between viral ELMs and their target host proteins would probably not produce significant e-values and fail to be identified. It is also important to remember that this method generates hypotheses that require direct biological and biochemical validation. This validation would ideally be performed within the organism of interest; however, if molecular techniques remain unavailable then a hybrid approach may be taken to provide a more rapid turnaround of hypothesis testing. For example, the established model organism Nematostella vectensis (sea anemone), a fellow anthozoan related to corals, has well-developed molecular methods that could be used to test the function of predicted coral immune proteins [16,49]. While this study has focused on characterizing metazoan immunity, the pipeline proposed here could also elucidate interactions between bacteria and their resident bacteriophage communities to identify novel bacterial immune proteins. Combining these data with metazoan-virus predictions would provide a broad understanding of immune processes occurring within the entire holobiont. The pipeline presented here predicts immune system structure by combining model organism-based databases with a domain-independent approach based on resident viral communities. This method can be applied to any holobiont with the potential to discover novel immunological processes.
4. Material and methods
(a) Isolation of virus-like particles from coral tissue and sequencing of viral DNA
Virus-like particles (VLPs) were isolated from individual colonies of Po. rus, Acropora sp. and P. verrucosa taken from two locations on the island of Mo'orea, French Polynesia while aquarium samples of A. yongeii were generously donated by the Birch Aquarium, San Diego, CA. Sampling locations included the back reef of the Richard Gump Research Station (LT) and Tema'e beach (TA). Briefly, coral tissue was homogenized in the field with a mortar and pestle and 12 ml of 0.02 µm filtered seawater was added until all coral tissue was removed from the skeleton. Next, the coral-tissue slurry was placed in a 15 ml falcon tube followed by the addition of 1 ml of chloroform and stored at 4°C for three weeks. To isolate VLPs from coral tissue, established ultracentrifugation protocols were used [13,50]. Briefly, 600 µl of 50 nmol l−1 dithiothreitol was mixed with 8 ml of coral slurry by vortexing, incubated at 37°C for 1 h and centrifuged for 15 min at 3000 r.p.m. In total, 7 ml of the supernatant was then transferred to the top of a CsCl step gradient containing 1.0 ml of each CsCl solution at densities of 1.7 g ml−1, 1.5 g ml−1 and 1.35 g ml−1. The step gradient containing the coral slurry was then spun for 2 h at 22 000 r.p.m. using an SW-41 Ti rotor (Beckman Coulter) and 1.5 ml was extracted from the 1.35–1.5 interface using a 24-gauge needle with an upward-facing tip. Next, the sample was treated with DNase at a final concentration of 100 units ml−1 and incubated at 37°C for 1 h. Finally, DNA was extracted from the VLPs using standard CTAB/formamide methods . Two quality control approaches were taken to ensure viral purity of the final DNA product. First, VLPs were visualized before and after ultracentrifugation using standard SYBR-gold staining techniques, followed by a 16S and 18S PCR to ensure the isolated DNA was not contaminated with cellular genetic material .
To obtain sufficient nucleic acid for library preparation viral DNA was amplified using a modified linker amplified shotgun library (LASL) method . Briefly, initial DNA concentrations were determined using the Qubit Fluorometric Assay (Life Technologies) and 5 ng of total DNA was combined with PCR-grade water to a final volume of 50 µl. Diluted DNA samples were briefly vortexed and transferred into 50μl Covaris tubes for shearing using the Covaris M220 Focused Ultrasonicator (40 s at 9 W). Sheared DNA was then end-repaired and ligated to LASL Linker A (Linker A FWD: 5′-P-GTATGCTTCGTGATCTGTGTGGGTGT-3′, Linker A REV: 5′-CCACACAGATCACGAAGCATAC-3′) followed by size selection with a target size of 500 base pairs using Pippin Prep (Sage Science) . For each biological sample, four PCR reactions were prepared using a barcoded primer and 17 cycles of PCR were performed (PfuTurbo Cx Hotstart DNA Polymerase, Agilent). Replicates of the same samples were combined, purified and separated into four aliquots for further amplification with three cycles of reconditioning PCR. Finally, all barcoded samples were quantified and pooled for library preparation and sequencing on MisSeq platform using the MiSeq Reagent Kit v3 600 cycles chemistry (Illumina).
(b) Quality control of sequence data
Preceding downstream analysis quality control of sequence data was performed using a variety of bioinformatic tools. Paired end reads were first joined using COPE , low-quality sequences (less than 100 base pairs in length, mean quality score less than 25, or containing more than 10% N's) were removed using PRINSEQ and any remaining bacterial or human sequences were removed with DeconSeq [14,15] (see electronic supplementary material, table S1 for the number of sequences removed at each pre-processing step).
(c) Bioinformatic pipeline used to analyse viromes
Viromes were compared with the predicted coral proteome  using standalone tBLASTn analysis with computational-based statistics turned on (D) and low complexity regions of proteins masked (seq) . Predicted coral proteins were classified as ‘hits’ if five or more viral sequence fragments matched a coral protein with an e-value less than 1 × 10−4. Predicted coral protein hits were then compared with the human proteome using BLASTp analysis with an e-value cutoff of less than 1 × 10−5 and the CDD with an e-value cutoff of 0.01 .
Permission to conduct this research was granted by the Haut-commissariat de la République en Polynésie Française (DRRT) (Protocole d'Accueil 2011–2012, Moorea).
Processed viromes and all supplementary material have been deposited in Dryad: http://dx.doi.org/10.5061/dryad.tb2p4 . All bioinformatics analyses performed are available in electronic supplementary material, file S6 and are available on GitHub under the article title.
Sample collection was done by S.D.Q., C.E.N. and A.F.H.; data generation/analysis was performed by S.D.Q., Y.W.L., G.G.Z.V., R.A.E. and F.L.R.; manuscript preparation was done by all authors.
The authors declare no competing interests.
This work was supported by an NSF Graduate Research Fellowship awarded to S.D.Q, an NIH grant no. (1 R21 AI094534-01) to F.L.R, a CIFAR Integrated Microbial Diversity Fellowship (IMB-ROHW-141679) to F.L.R., and National Science Foundation grants OCE–1538567 (to L.W.K.), OCE–1538393 (to C.E.N.) and CNS-1305112 and MCB-1330800 (to R.A.E). This paper is funded in part by a grant/cooperative agreement from the National Oceanic and Atmospheric Administration, Project A/AS-1, which is sponsored by the University of Hawaii Sea Grant College Program, SOEST, under Institutional Grant No. NA14OAR4170071 from NOAA Office of Sea Grant, Department of Commerce. The views expressed herein are those of the author(s) and do not necessarily reflect the views of NOAA or any of its sub-agencies. This manuscript is UH SOEST publication no. 9695 and UH Sea Grant publication number UNIHI-SEAGRANT-JC-15-19. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
We gratefully acknowledge the support of the National Science Foundation (OCE 12-36905 and earlier awards) to the Moorea Coral Reef LTER site and Fernando Nosratpour and Vince Levesque at the Birch Aquarium for their generous donation of A. yongeii.
- Received May 31, 2016.
- Accepted August 4, 2016.
- © 2016 The Authors.
Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited.