Rates of genomic divergence in humans, chimpanzees and their lice

Kevin P. Johnson , Julie M. Allen , Brett P. Olds , Lawrence Mugisha , David L. Reed , Ken N. Paige , Barry R. Pittendrigh


The rate of DNA mutation and divergence is highly variable across the tree of life. However, the reasons underlying this variation are not well understood. Comparing the rates of genetic changes between hosts and parasite lineages that diverged at the same time is one way to begin to understand differences in genetic mutation and substitution rates. Such studies have indicated that the rate of genetic divergence in parasites is often faster than that of their hosts when comparing single genes. However, the variation in this relative rate of molecular evolution across different genes in the genome is unknown. We compared the rate of DNA sequence divergence between humans, chimpanzees and their ectoparasitic lice for 1534 protein-coding genes across their genomes. The rate of DNA substitution in these orthologous genes was on average 14 times faster for lice than for humans and chimpanzees. In addition, these rates were positively correlated across genes. Because this correlation only occurred for substitutions that changed the amino acid, this pattern is probably produced by similar functional constraints across the same genes in humans, chimpanzees and their ectoparasites.

1. Introduction

Understanding differences between species in the rate of molecular evolution is of considerable interest in the fields of evolution, molecular biology, population genetics and systematics. However, estimating this rate variation is often difficult because of uncertainties regarding the timing of diversification. Host and parasites provide a system in which the relative rates of molecular evolution can be directly estimated, provided some degree of congruence (codivergence) between host and parasite evolutionary trees exists [1,2]. Several such relative rate estimates have been made for a variety of host and parasite systems, and parasites are usually observed to evolve more rapidly than their hosts at the molecular level [29]. Some previous explanations for this phenomenon have been shorter generation times, population bottlenecks or relaxed selection for mutation repair in parasite lineages when compared with their hosts.

However, most of these studies compare only relative rates of genetic substitution for a single gene, typically mitochondrial genes because of the ease of sequencing homologous loci in host and parasite taxa. There has been no study of the variation in the relative rates of DNA substitution between hosts and parasites across the genome. The codivergence event between humans, chimpanzees and their lice (genus Pediculus) is well documented and provides an excellent calibration point for comparing rates of genetic divergence [1012]. The genomes of humans (Homo sapiens), common chimpanzees (Pan troglodytes) and the human body louse (Pediculus humanus) have already been sequenced [1315]. We sequenced across the genome of the chimpanzee louse (Pediculus schaeffi) using next-generation sequencing of a 500 bp paired-end library with the Illumina HiSeq2000 platform in a single lane and achieved approximately 120× coverage. Because these data were not collected for complete genome assembly as a goal, for the chimpanzee louse, we used reference-based assembly of individual nuclear encoded genes that are orthologues between humans, chimpanzees and the human body louse. These individual gene assemblies were then used to compare the relative rate of divergence across these genes between these ectoparasites and their hosts.

2. Material and methods

Lice were collected from chimpanzees (Pan troglodytes schweinfurthii) from Ngamba Island Chimpanzee Sanctuary during annual health checks. Total genomic DNA was extracted by grinding five adult female specimens of the chimpanzee louse (P. schaeffi) in 300 µl of saline EDTA with 5 µl of lysozyme, incubating at 37°C for 1 h. The 5 µl of proteinase K and 10 µl 25% SDS solution were added and incubated at 55°C for 1 h. After incubation, 1 : 1 volumes of phenol : chloroform were added to the solution. The mix was centrifuged (10 min at 13 000g) and 200 µl of chloroform was added to the aqueous layer and centrifuged (5 min at 13 000g). The supernatant was combined with 1/10 volume of 3 M sodium-acetate and 0.7 volumes of isopropanol and placed in −20°C for 2 h. DNA was pelleted by centrifuging for 15 min at 13 000g at 4°C. Supernatant was removed and DNA washed in cold 80% ethanol and resuspended in nuclease-free water. Additional voucher specimens are stored at the University of Florida.

A 500 bp shotgun library was constructed using this extract and sequenced using pair-end reads on a single lane of an Illumina HighSeq2000 Analyzer with 100 bp reads. Raw sequences from the chimpanzee louse (P. schaeffi) are deposited in GenBank Short Read Archive (accession SRX390495). We searched OrthoDB (http://cegg.unige.ch/orthodb6, [16]) for genes that are 1 : 1 : 1 single copy orthologues between humans (Homo sapiens), common chimpanzee (Pan troglodytes) and human body lice (P. humanus). This initial search recovered 3026 potential orthologues. We used CLC Genome Workbench (CLCbio) to assemble the Illumina reads for the chimpanzee louse against coding DNA sequences of the 3026 orthologues from the human body louse (P. humanus) genome [15] in VectorBase (https://www.vectorbase.org). In addition to protein sequences from OrthoDB, we also retrieved coding DNA sequences for the 3026 putative orthologues for humans and chimpanzees from the Ensembl database [17].

As a further check on orthology, we used a BLAST (NCBI Blast 2.2.27) search of the chimpanzee genes against the human protein sequence database (from Ensembl). We also conducted a BLAST search of the chimpanzee louse assemblies against the protein database for human body louse (from VectorBase). Only those genes for which the top BLAST hit corresponded to the putative orthologue from OrthoDB were included in subsequent analyses (1724 genes). Gene sequences from humans and chimpanzees and from human body lice and chimpanzee lice were pairwise aligned in Muscle [18]. In 164 genes, stop codons were detected in the Pan troglodytes sequences and in 11 cases stop codons were detected in the P. schaeffi sequences. These genes were removed from further analyses. A statistical test for outliers using Z-value scores was performed on the per cent divergence values (see below), and after Bonferroni correction 11 genes were determined to be outliers for humans/chimpanzees and three genes for the lice. These outliers were removed leaving a final dataset of 1534 orthologous genes (alignment deposited in Dryad doi:10.5061/dryad.9fk1s).

For each of the 1534 orthologous loci, we used custom Perl scripts (deposited in GitHub, www.github.com/juliema/publications) to calculate uncorrected per cent sequence divergence between humans and chimpanzees and between the human body louse and chimpanzee louse. These DNA sequences were also translated to protein sequences and the same comparisons were made. These comparisons did not include any sites in which gaps were introduced because of the alignment. In addition, Dn and Ds values were calculated for all these comparisons using the codeml program in the PAML package (v. 4.4b, [19]) and Nei & Gojobori's calculations of Dn and Ds [20]. To estimate the genome-wide relative rate, all sites were pooled for each species across genes and the average genome-wide sequence divergence was calculated by dividing the total number of substitutions by the total number of sites. All statistical analyses of these values were performed in the R statistics package [21].

To estimate the relative rates of sequence divergence for mitochondrial genes, the mitochondrial genomes for humans, chimpanzees and P. humanus were downloaded from GenBank. Because of the extremely high divergences between P. humanus and P. schaeffi, reference-based assemblies could not be used. Rather, we used a combination of Target Restricted Assembly [22] and BLAST searches of a de novo partial genome assembly of P. schaeffi constructed using SOAPdenovo [23] to obtain sequences of mitochondrial protein-coding genes of P. schaeffi. The sequences for two short genes (ATPase8 and ND4L) could not be recovered using these methods. The sequences for the 11 genes that were recovered were aligned against sequences for P. humanus and genetic distances and relative rates calculated in the same way as for nuclear genes.

3. Results and discussion

For any relative rate comparison, it is important that orthologous genes are being compared, in this case between humans, chimpanzees, human lice and chimpanzee lice. First, we used the Ortholog Database (OrthDB, 16) to determine which protein-coding genes were strict 1 : 1 : 1 orthologues between humans, chimpanzees and the human body louse. Using reference-based assembly against the human body louse and reciprocal best BLAST analyses, we compiled sequences for these same 1534 orthologue genes in the chimpanzee louse genome (Material and methods). Comparisons of the per cent sequence divergence across these 1534 orthologues between human lice (P. humanus) and chimpanzee lice (P. schaeffi) to those for humans and chimpanzees revealed that lice are evolving 14.8 times faster at the DNA sequence level than their hosts (Wilcoxon signed-rank test, p < 0.0001). However, there was considerable variation in DNA sequence divergence across genes for both humans and chimpanzees (s.d. = 0.003) and for lice (s.d. = 0.02) (figure 1).

Figure 1.

Plot of pairwise uncorrected sequence divergence across 1534 nuclear protein-coding genes between human and chimpanzee lice against that for the orthologous gene in humans and chimpanzees. The solid line indicates least-squares regression line (slope = 1.08, Student's t = 7.38, p < 0.0001), whereas the dashed line indicates expectation if genes evolve at the same rate in both groups. Points above the dashed line are genes that evolved faster in lice and points below the line are genes that evolved faster in humans and chimpanzees (none in this case).

Some of this variation could be explained by a correlation between sequence divergences for a gene between lice and between primates (figure 1). Regression of pairwise divergences across the 1534 orthologous genes revealed a significantly positive correlation (slope = 1.08, Student's t = 7.38, p < 0.0001), though this correlation accounted for only a small fraction of the variation in divergence values across genes (r2 = 0.034).

This correlation in rates of divergence across the genome could potentially be explained by similar selective constraints on the same gene in different taxa. Regression of protein divergences between lice against those between humans and chimpanzees were also significantly positive (slope = 1.98, Student's t = 12.56, p < 0.0001) and this correlation explained a larger fraction of the variation (r2 = 0.093) than did comparisons of DNA substitutions. As with DNA divergence, at the protein level lice were evolving 8.6 times faster than primates. The average protein divergence between the lice was much higher (4.8%) than between humans and chimpanzees (0.6%).

To evaluate whether functional constraints on amino acids could fully explain the correlation across the genome in rates of DNA substitution, we used estimates of Dn (non-synonymous divergence) and Ds (synonymous divergence) calculated for pairwise comparisons for all 1534 genes between the two louse species as well as and between humans and chimpanzees. Regression of Dn for lice against Dn for humans and chimpanzees (figure 2a) revealed a significantly positive correlation (slope = 1.86, Student's t = 11.20, p < 0.0001). By contrast, regression of Ds for lice against Ds for humans and chimpanzees (figure 2b) showed no correlation (Student's t = 0.82, p = 0.41), even though synonymous substitutions accumulate 25.13 times faster in lice than their hosts.

Figure 2.

Plots of (a) non-synonymous divergence (Dn), the and (b) synonymous divergence (Ds) across 1534 nuclear protein-coding genes between human and chimpanzee lice against that for the orthologous gene in humans and chimpanzees. The solid line indicates the least-squares regression line (a: slope = 1.86, Student's t = 11.20, p < 0.0001; b: Student's t = 0.82, p = 0.41), while the dashed line indicates expectation if genes evolve at the same rate in both groups. Points above the dashed line are genes that evolved faster in lice and points below the line are genes that evolved faster in humans and chimpanzees.

The Dn/Ds ratio is often used as an estimate of the relative level of functional constraint on protein evolution and has also been used to detect positive selection. Dn/Ds ratios much less than 1 usually indicate purifying selection against amino acid changes, whereas Dn/Ds ratios greater than 1 are often taken as evidence for adaptive or positive selection [24]. Interestingly, none of the Dn/Ds ratios for these genes in lice were greater than 1, while Dn/Ds ratios for 24 genes were greater than 1 for humans and chimpanzees. In fact, the Dn/Ds ratios for primates (mean = 0.11) were significantly higher than those for lice (mean = 0.08) for the same genes (Wilcoxon signed-rank test p < 0.00001). The correlation between Dn values across genes between lice and primates is also reflected in the positive correlation between the Dn/Ds ratios (slope = 0.058, t-value = 8.65, p < 1 × 10–15).

Across 1534 genes that could be confidently assigned as orthologues between humans, chimpanzees and their parasitic lice, these genes evolve just over 14 times faster in lice when compared with in their hosts. Interestingly, the genetic divergence between lice was correlated with the genetic divergence in these same genes between humans and chimpanzees. That is, genes that evolve more rapidly in humans and chimpanzees also evolve more rapidly in their parasitic lice, even though lice and primates are separated by more than 600 million years of evolution [25]. This is, to our knowledge, the first evidence of correlated rates of evolution across the genome between hosts and their parasites. The explanation for this correlation appears to be the relative level of functional constraint on different genes. Comparisons of the rate of silent substitutions did not reveal any correlation, while comparisons of the rate of substitutions that changed the amino acid (replacement substitutions) were correlated.

This study also provides an estimate of the relative rate of substitution between hosts and parasites across their genomes. Previous comparisons have generally involved single genes, typically from the mitochondrion. These estimates of mitochondrial relative rates of substitution between humans, chimpanzees and their lice indicated that lice evolve 2.3 times faster based on only the cytochrome oxidase I and cytochrome b gene regions [10]. However, this prior comparison used slightly different methods and did not estimate rates of substitution across the entire mitochondrial genome. Applying the same methods used in this study to mitochondrial protein-coding genes (see Material and methods), we estimated that mitochondrial genes in lice evolve 2.9 faster than in humans and chimpanzees, generally in line with these prior estimates. However, there was no correlation across these mitochondrial genes in the degree of genetic divergence (Student's t = 0.52, p = 0.62) as there was for nuclear genes.

The fact that both nuclear and mitochondrial genes in parasitic lice evolve more rapidly than those of their hosts suggest a more universal explanation for the rate increase is needed, rather than explanations specific only to the mitochondrion. Mitochondrial genomes of lice are highly rearranged compared with other insects [2529], and these rearrangements were hypothesized to be correlated with increased substitution rates. In fact, in the human body louse (P. humanus), the mitochondrion is divided into a number of minicircular chromosomes [15,30]. The lack of mitochondrial single-stranded binding protein (mtSSB) in lice was postulated to account for this correlation [29]. However, the lack of mtSSB cannot explain the rate increase for nuclear protein-coding genes. In fact, the relative rate of substitution for nuclear genes is even much higher (14.8×) than it is for mitochondrial genes (2.9×), further suggesting that factors not specific to the mitochondrion are involved.

Another hypothesis that has been used to explain the higher substitution rate in lice, compared with their vertebrate hosts, is an elevated rate of slightly deleterious substitutions because of repeated population bottlenecks in lice upon transmission between hosts and inbreeding on individual hosts [4]. However, the fact that the Dn/Ds ratio is significantly higher for primates when compared with lice is evidence against this hypothesis, which predicts elevated non-synonymous substitutions (assumed to be slightly deleterious) in lineages with stronger bottlenecks.

In addition, the elevated substitution rate in lice occurs for both synonymous and non-synonymous substitutions (figure 2), which suggests that underlying differences in mutation rates must be accounted for in the explanation. Some possible explanations include genome-wide relaxed selection for mutation repair [29], shorter generation times [3] or overall elevated mutation rates [4]. In fact, an increased mutation rate combined with purifying selection could potentially explain the reduced Dn/Ds ratio in lice when compared with primates. Under this hypothesis, synonymous mutations would accumulate in proportion to the mutation rate [31], whereas non-synonymous mutations would be eliminated by purifying selection, which would lower the Dn/Ds ratio in the case of elevated mutation rates. Functional constraints on protein structure appear to play some role in determining the substitution rates for different genes; however, it explains very little (less than 3%) of the overall variation in substitution rates across genes for lice compared with humans and chimpanzees.

Research permit approvals were from Uganda Wildlife Authority (permit NS71) and Uganda Council of Science and Technology (UNCST).

Data accessibility

Sequences of raw Illumina reads are deposited in the GenBank Short Read Archive (Accession number SRX390495). Gene alignments are deposited in Dryad (doi:10.5061/dryad.9fk1s). Custom Perl scripts are deposited on GitHub (www.github.com/juliema/publications).

Funding statement

This work was supported by NSF grants DEB-1239788, DEB-1050706 and DEB-0612938 to K.P.J., DEB-0717165 and DEB-0845392 to D.L.R., and DEB-1010868 to K.N.P.


We thank H. M. Robertson and K. K. O. Walden for assistance with Ensembl. We thank the Ngamba Island Chimpanzee Sanctuary for the opportunity to collect lice samples during their annual health checks.

  • Received August 20, 2013.
  • Accepted December 4, 2013.


View Abstract