Royal Society Publishing

Population genomics of parallel phenotypic evolution in stickleback across stream–lake ecological transitions

Bruce E. Deagle , Felicity C. Jones , Yingguang F. Chan , Devin M. Absher , David M. Kingsley , Thomas E. Reimchen


Understanding the genetics of adaptation is a central focus in evolutionary biology. Here, we use a population genomics approach to examine striking parallel morphological divergences of parapatric stream–lake ecotypes of threespine stickleback fish in three watersheds on the Haida Gwaii archipelago, western Canada. Genome-wide variation at greater than 1000 single nucleotide polymorphism loci indicate separate origin of giant lake and small-bodied stream fish within each watershed (mean FST between watersheds = 0.244 and within = 0.114). Genome scans within watersheds identified a total of 21 genomic regions that are highly differentiated between ecotypes and are probably subject to directional selection. Most outliers were watershed-specific, but genomic regions undergoing parallel genetic changes in multiple watersheds were also identified. Interestingly, several of the stream–lake outlier regions match those previously identified in marine–freshwater and benthic–limnetic genome scans, indicating reuse of the same genetic loci in different adaptive scenarios. We also identified multiple new outlier loci, which may contribute to unique aspects of differentiation in stream–lake environments. Overall, our data emphasize the important role of ecological boundaries in driving both local and broadly occurring parallel genetic changes during adaptation.

1. Introduction

Uncovering the genetic basis of local adaptation in natural populations will refine our understanding of natural selection [1], allow insight into how species respond to environmental change [2] and shed light on the process of speciation [3]. Many studies investigating the genetics of adaptation have taken a population genomics approach in which genome-wide patterns of genetic variation are documented in many individuals within a species [46]. Using this approach, regions of the genome that are under divergent selection between local populations (outlier loci) can be identified by their high level of differentiation compared with the background levels [79]. Putatively, neutral non-outlier loci also provide insight by providing a clearer window into population history [10]. In cases where subpopulations have diverged enough to become reproductively isolated, the genetics of speciation can be examined [3,11]. To date, the majority of population genomics studies on wild species have used anonymous genetic markers [46]. However, with advances in high-throughput genetics and mounting numbers of completed genome projects, markers with known locations within the genome are increasingly being examined, narrowing the search for underlying genes [12].

The threespine stickleback (Gasterosteus aculeatus) has become a powerful ecological model species with both well-studied natural history and extensive genetic resources. This small fish inhabits marine environments throughout the temperate Northern Hemisphere and has colonized countless freshwater rivers, streams, ponds and lakes [13,14]. Several studies have uncovered the genetic basis of phenotypic traits that have evolved during repeated colonizations of the freshwater environment by marine ancestors [15,16]. These studies have highlighted cases of parallel phenotypic evolution occurring via selection on the same genomic loci. More broadly, parallel evolution has produced genome-wide patterns, with many of the same genomic regions being under selection in independently derived freshwater populations [12].

Within freshwater habitats formed since the end of the last ice age (approx. 12 000 years ago), there has been a considerable radiation in stickleback morphology, with numerous adaptations documented among populations [17,18]. This recent radiation presents further superb opportunities to examine the genetics of adaptation. These freshwater populations include divergent parapatric and sympatric populations. Parapatric populations of stickleback inhabiting adjacent streams and lakes are particularly well studied and show varying levels of habitat-specific morphological adaptations, with some distinct pairs providing convincing examples of ecological speciation [1924]. Morphological differences between parapatric stream–lake sticklebacks were originally described in Mayer Lake [19] and Drizzle Lake [20] on the Haida Gwaii Archipelago. In these two systems, the divergence between lake and stream fish is remarkably parallel; lake fish have higher gill raker counts, a larger more streamlined body, longer spines and an unusual melanistic coloration [19,20,25]. There is evidence that the differences are adaptations to both divergent predation and trophic selective landscapes [18,20,26]. The most exceptional feature of stickleback in these lakes is their large body size. Among more than 100 Haida Gwaii lakes and streams surveyed, only eight include ‘giant’ stickleback (length greater than 75 mm) [25,27]. These giants include stickleback from Drizzle, Mayer and nearby Spence lakes—populations located on a post-glacial outwash plain dotted with dozens of lakes, ponds and streams with stickleback of typical body size [25].

Here, we evaluate genome-wide patterns of single nucleotide polymorphism (SNP) variation at greater than 1000 loci in stickleback from stream and lake habitats in Drizzle, Mayer and Spence drainage systems. While independent colonization and evolution seem likely [20], the flat topology, close proximity of populations and striking similarity of the fish make it difficult to rule out historical connections. SNP data were used to address two sets of questions: (i) are these stream–lake sticklebacks (and giant stickleback) independently derived? and (ii) what are the adaptive genetic differences between the stream–lake fish, and have parallel genomic changes occurred between systems?

2. Material and methods

(a) Study area and stickleback collection

The three study systems are located on northeast Haida Gwaii, off the Pacific coast of Canada (figure 1). In the Drizzle system, sticklebacks were collected from the lake, and the only significant inlet and outlet streams [20,28]. Mayer system sticklebacks were collected from the lake and three inlet streams [19,25], and sticklebacks in the Spence system were collected from the lake and outlet. Collections were made in May–June of 2009 and 2010 using minnow traps and fish preserved in 95 per cent ethanol. When numbers permitted, 20 fish were arbitrarily selected (avoiding juveniles less than 40 mm) for genotyping and morphological analysis (table 1). In streams flowing into Mayer Lake, sticklebacks morphologically similar to those from Mayer Lake were captured alongside typical stream-form fish (differentiated based on colour, shape and size; photo in the electronic supplementary material, figure S1). In these inlet streams, 20 of the stream-form fish were studied along with additional lake-form fish (table 1).

View this table:
Table 1.

List of study populations, sample sizes and pairwise FST between populations within watersheds.

Figure 1.

Map showing location of study populations. Insets show location of lake and stream sampling sites within watersheds. White dotted lines trace path of outlet streams.

(b) Morphological analysis

We measured six metric and three meristic traits: standard length, body depth, first dorsal spine length, left pelvic spine length, gape length, eye diameter, gill raker number on first left branchial arch (upper and lower arms) and number of lateral plates. All metric traits were size standardized to allow size-independent comparisons. This was accomplished by fitting a general linear model (GLM) for each trait with standard length as a covariate and population as a factor. Standard length by population interaction was non-significant for all traits (p > 0.12), we therefore used population coefficients from GLMs fit without interaction to calculate population-specific expected values for each trait corresponding to a length of 60 mm [29]. Standardized trait values for each fish were calculated by adding a size-scaled residual (residual × (60/length)) to expected values.

Multi-variate morphological differentiation of the study populations was assessed using principal components analysis (PCA). Data from all variables (size standardized if appropriate) were scaled to have unit variance before calculation by a singular value decomposition of the matrix in R (v. 2.9.0) statistical software [30].

(c) DNA extraction and SNP genotyping

Stickleback genomic DNA was genotyped at 1536 biallelic SNP loci using Illumina's BeadArray Technology and GoldenGate assay (Illumina, San Diego, CA, USA) following the methodology described by Jones et al. [31]. SNPs were ascertained from two marine and three freshwater stickleback populations, geographically distant (greater than 800 km) from those in the current study [31]. GenomeStudio software (v. 2010.2; Illumina) was used to visualize and manually adjust all intensity clusters. SNPs with poorly separated clusters or low signals (n = 342) were excluded. In the exported data, SNPs missing greater than 10 per cent of genotypes calls were removed (n = 24) as were any stickleback with greater than 5 per cent missing data. Repeatability of genotype calls was greater than 99 per cent in two samples run in triplicate. The final dataset comprised 1170 SNPs from 167 sticklebacks (table 1).

The SNPs cover all 21 stickleback linkage groups, mtDNA and unassembled scaffolds (electronic supplementary material, table S1). They fall into three groups [31]: (i) SNPs chosen to be evenly distributed across the genome based on local recombination rate (n = 773); (ii) assembly SNPs, chosen to tag unoriented or unassembled parts of the genome (n = 117); and (iii) candidate SNPs chosen to target specific genomic regions of interest (primarily regions differentiated between marine and freshwater populations or potentially linked to traits of interest; n = 280).

(d) Population differentiation based on SNP data

We used PCA and tree-based clustering methods to evaluate structure in the genetic data, using all evenly distributed SNPs except sex-linked loci (760 loci). We also re-ran analyses with outlier loci removed (n = 27, defined with a low stringency Bayesian prior of 1, see below). PCA has been used extensively in analysis of SNP data as an unsupervised clustering method to identify population structure [32]. Since PCA requires a dataset without missing values, we filled in the less than 1 per cent missing entries in our final SNP dataset by randomly sampling genotype data for the particular locus (across all localities). This conservative approach homogenizes genotype frequencies across populations and re-sampling had little effect on the PCAs. For tree-based analysis, we calculated FST (with sample size correction) and used the program PopTree2 [33] to produce neighbour-joining (NJ) trees based on population allele frequencies. Alternate genetic distance measures (Nei's DA and Nei's standard genetic distance DST [33]) produced congruent results (data not shown). Finally, we constructed individual-based distance trees in MEGA [34]. In this case, we created an artificial nucleotide sequence by concatenating all diploid SNP data (missing data coded as N) for each individual and calculated a pairwise uncorrected P distance matrix (equivalent to allele sharing distance), then produced a NJ tree.

(e) Outlier detection

We performed a genome scan for FST outliers using the Bayesian approach implemented in BayeScan v. 2.01 [6,8,9]. BayeScan estimates the probability that a given SNP is under selection by calculating the posterior odds (POdds), which is the ratio of the posterior probabilities of two models (selection/neutral) for each locus, given the allele frequency data [8,9]. Analyses were carried out separately for each of the physically isolated watersheds, rather than a global analysis [4,9]. Initially, the prior probability of the model with selection was set at 1/10 (assumes a priori that the neutral model is 10 times more likely than the model including selection). We also ran analyses with a prior probability of 1 allowing identification of less-stringently defined outliers. Outliers identified with a prior of 1 were excluded in some analyses (see §2d) and only reported when detected in multiple independent genome scans (see §3). Default parameters were used in all BayeScan runs. To define outliers, the expected false discovery rate was kept constant (less than 0.05) and the POdds threshold defining outliers varied correspondingly [6].

3. Results

(a) Morphology

In all three watersheds, lake fish have greater standard length, more gill rakers and longer size-adjusted pelvic spines compared with adjoining stream fish. Lake fish also have shallower body depth and longer dorsal spines in the Mayer and Drizzle systems, but not in the Spence system. The lateral plate number varies among watersheds, and was consistently lower in streams (not significantly in Drizzle). These results are consistent with larger morphological datasets collected from some sites previously [19,20,25,28] and confirm strong parallels in phenotypic divergence between stream and lake fish, especially in Mayer and Drizzle systems [20]. The morphology of lake-form stickleback found in Mayer watershed streams matched Mayer Lake stickleback except they had shallower body depth. Morphology data are summarized in electronic supplementary material, figure S2.

PCA of morphological variables differentiated fish from stream and lake habitats on the first axis (figure 2a). This axis accounted for 42 per cent of the variance and had large positive loadings for body depth combined with large negative loadings for standard length, number of lower gill rakers and length of spines. On PC1, there is no overlap between stream and lake fish collected from the same watershed. There is considerable overlap between the stream fish from separate watersheds, reflecting their similar overall morphology. PC2 (19%) has large loadings for gape length and eye diameter; these traits did not consistently differ between habitats or watersheds.

Figure 2.

Differentiation of stream–lake stickleback based on morphological versus genetic data. Colour distinguishes stream (orange) and lake (grey) fish, symbols identify watershed. (a) First two principal components (PCs) from nine morphological variables (typical Mayer Lake and Gold Creek fish are shown) (b) first two PCs from 760 SNPs (evenly distributed, non-sex-linked loci). (c) Population-level neighbour-joining tree based on FST across the 760 loci. Per cent bootstrap support (1000 replicates) shown at nodes. Removal of stream–lake outlier loci has little effect on PCA or tree (electronic supplementary material, figures S3 and S4).

(b) Population differentiation based on SNP data

In these analyses, we used all evenly distributed, non-sex-linked SNPs (760 loci) and also with the sub-set of these identified as outlier loci excluded (n = 27; defined with a low stringency prior of 1). By definition, removal of divergent outliers generates lower genetic distances within watersheds, but this did not substantially change clustering results. We present data with outliers removed in electronic supplementary material, figures S3 and S4.

(i) PC analysis

PCA of the genetic data clearly separates stickleback into collection locations in a hierarchical fashion. The first two PCs separated stickleback into three clusters corresponding to watershed of origin (figure 2b). PC1 (12.7%) separates Mayer from Drizzle and Spence watersheds, which are in turn separated on PC2 (9.7%). In subsequent PCs, separation is seen between stream and lake stickleback in Drizzle (PC3), Mayer (PC4) and Spence (PC6). PC5 separates Spam Creek from other Mayer creeks (see the electronic supplementary material, figure S3).

(ii) Tree-based analysis

Our population-level trees separated the three watersheds at the basal node with 100 per cent bootstrap support (figure 2c), reflecting the higher between watersheds FST (mean = 0.244) compared with within watersheds (mean = 0.098; table 1). The next nodes separate lake populations from stream populations within each of the watersheds (figure 2c; mean FST = 0.114). For pairwise FST with and without outliers, see the electronic supplementary material, table S2. Individual-based trees produce congruent results (electronic supplementary material, figure S4).

(c) Stream–lake outlier loci

All variable SNPs (including assembly and candidate SNPs) were used in outlier loci analyses. Genome scans were performed separately for each watershed with differing number of variable loci (Drizzle: n = 864, Mayer: n = 917 and Spence: n = 966). Initially, we used a prior probability of 10, and in the three comparisons identified 34 SNPs showing a pattern of differentiation indicative of divergent selection (‘Prior10’ outliers; table 2 and figure 3). The Prior10 outlier SNPs include 21 genomic regions (when SNPs less than 20 kb apart are grouped), three of the outlier regions were identified in multiple watersheds (two between Drizzle and Mayer, one between Drizzle and Spence; table 2). One of the regions (Chr4–19.8 Mb) contained a high density of candidate SNP markers with 14 individual outlier SNPs identified over a 90 kb block (table 2 and figure 3). The remainder of the Prior10 outliers were defined by a single SNP.

View this table:
Table 2.

Outlier loci identified in each watershed and corresponding posterior odds (POdds) from BayeScan [9]. Shared outliers were identified in at least two watersheds. Loci shown in bold were outliers in all three watersheds.

Figure 3.

Genome scans in stream–lake sticklebacks. (a) Plots show posterior odds of each SNP marker for each of the three watersheds (BayeScan [9]; prior of 10). Vertical lines separate 21 chromosomes and unassembled scaffolds. Horizontal dotted lines show outlier thresholds corresponding to a false discovery rate of less than 0.05. SNPs with a red dot are outliers in multiple watersheds (including reduced stringency outliers) and green dots indicate watershed-specific outliers. (b) Image plot showing genotypes of SNPs located along part of chromosome 4 (15.7–32.6 Mb) for Mayer and Drizzle stickleback (red, AA; yellow, Aa; blue, aa). Genotypes were colour-coded, so markers homozygous for the most common allele in Drizzle Lake are red. Three shared outlier regions are labelled, including Chr4–19.8 Mb (a candidate region with a high density of SNPs).

To identify additional loci under divergent selection in multiple watersheds, we set a less-stringent outlier threshold by lowering the BayeScan prior to 1. Under this criterion, the SNPs identified within each watershed may include some false positives; however, since each watershed is independent, the chance of an SNP being incorrectly picked as an outlier multiple times is minimal (no data were used in multiple genome scans, unlike many previous studies [5,6]). With the lower prior, the number of genomic regions identified as outliers in at least one watershed increased from 21 to 73. Of the additional 52 genomic regions, six were identified in multiple watersheds (table 2). All of the new shared outliers have Bayes factors of greater than 7 (corresponding to substantial evidence), most are greater than 10 (strong). One new outlier region (Chr19–14.8 Mb) contains two SNPs identified as outliers in all three watersheds. Several additional SNP loci were identified within the Chr4–19.8 Mb Prior10 outlier region, one of these is an outlier in all three watersheds (table 2).

Despite clear morphological separation of the stream–lake stickleback in some traits, fixed differences in allele frequencies between habitats were not observed (figure 3; and for all outlier loci allele frequencies see the electronic supplementary material, figure S5). For shared outliers, allele frequencies mostly (but not universally) diverged in the same direction between habitats. Within watersheds where multiple streams were sampled, divergence was between all streams versus lake fish for 12 of 15 outliers (i.e. in three cases one stream had the lake allele at a high frequency).

4. Discussion

(a) Origin of the stream–lake stickleback

Our genome-wide data indicate that the stream–lake stickleback we examined originated in separate divergences within each of the three watersheds. This implies three origins of the ‘giant’ stickleback in this small area on Haida Gwaii. These events presumably represent independent selection on standing genetic variation present in marine ancestors since few new genetic mutations would have been expected since post-glacial colonization. It is possible that similar watershed-dominated genome-wide genetic structure could be produced by secondary gene flow within watersheds after a single origin and dispersal by a giant-like ancestor. However, this would require maintenance of habitat-specific morphological distinctiveness in the face of extensive and long-term homogenizing gene flow within each watershed. In either case, it is clear that habitat, rather than history, has played a deterministic role in shaping the current morphological diversity. The separate evolution of these stream–lake stickleback contrasts with those in Germany, where primary genetic divergence was among habitat type [35]; our results are consistent with genetic data from stream–lake stickleback on Vancouver Island (400 km to the south of Haida Gwaii) [24].

Given that the typical freshwater form of stickleback has evolved from the marine ancestor countless times throughout the stickleback distribution [13], the limited geographical distribution of giant lake stickleback that are highly distinctive from adjoining streams remains a conundrum. It may be that differences in predation regimes [18,26], or other biological and physical factors beyond the benthic–limnetic ecological contrast typically invoked [22,23], are required to drive divergence to the level observed in these Haida Gwaii populations. Or perhaps, the particular haplotypes under selection are restricted to this geographical area. While SNPs on the array are present elsewhere (since they were ascertained from other populations), they may be tagging haplotype variants that are restricted to Haida Gwaii. Gene flow via marine stickleback [11,15] could have facilitated movement of some genomic components between these closely located watersheds. In this scenario, while the forms are assembled independently, some of the same allelic variants may be used in each process (see §4b below).

There are no current physical barriers to dispersal between sampling sites within each system; this is highlighted by the presence of lake-form fish in the streams entering Mayer Lake. These ‘lake’ fish are genetically indistinguishable from lake-collected fish, suggesting they are not permanent stream residents. Lake-form fish were not identified in these creeks during previous sampling [19]. Despite this, they made up a substantial proportion (approx. 25%) of fish trapped in Mayer streams during the current study and presumably can have extensive ecological interactions with the stream-form. Sympatric coexistence of these two ecotypes is reminiscent of benthic–limnetic species pairs that coexist in some lakes [36]. Stream-form fish were not detected in any lakes during the current study, and despite extensive sampling in Mayer and Drizzle lakes, they have only been reported near stream mouths in Mayer Lake [26]. In the Drizzle and Mayer systems (where we sampled multiple streams), stream-form fish are generally more genetically similar to each other than to lake fish, and this is a genome-wide effect that persists when outlier loci are removed. This indicates continued gene flow through the lake, perhaps facilitated by phenotype-dependent habitat preferences [37]. Why the ‘benthic’ stream-forms do not become established in these lakes is an intriguing question and remains a topic for future ecological investigations.

(b) Stream–lake outlier loci

Genome-wide scans identified several genomic regions subject to divergent selection in each watershed. Given the density of markers used, the outlier SNPs are probably linked to selected haplotypes, rather than representing causative adaptive mutations. The proportion of adaptive genetic variation detected is dependent on both the number of markers and the level of linkage disequilibrium across the genome. Overall, our genomic coverage is roughly one SNP every 600 kb (450 Mb/760 evenly spaced SNPs) with some areas of higher coverage owing to the candidate SNPs on the array (n = 280). So, while our coverage is substantially higher than many previous scans of sticklebacks [38,39], ours is almost certainly not an exhaustive catalogue of the genomic loci under divergent selection. Overall, we identified 21 stringently defined outlier regions; eight of these were from the evenly distributed SNP group (i.e. approx. 1% of these non-candidate SNPs were outliers). This is low compared with 2–10% of loci generally found to be under diversifying selection [7]; however, comparisons between studies are difficult owing to differences in markers, outlier detection methodologies and number of samples.

Given the parallels in phenotypic differentiation between the stream and lake stickleback and the numerous examples of parallel evolution at the genetic level in recent stickleback literature [12,15,16], one might expect that a substantial proportion of outliers would be shared among watersheds. This is not the case; only three out of 21 stringent outlier regions (14%) were identified in more than one system (using a less-stringent outlier threshold, it is 9 out of 73; 12% shared). Including the less-stringently defined outliers, only two genomic regions are differentiated in all three watersheds. While not prevalent, these repeated genetic contrasts provide strong evidence for habitat-specific selection driving adaptive evolution at these loci.

The Chr4–19.8 Mb outlier region was covered by a relatively high density of SNP markers on the array. A closer look at this area highlights issues that can complicate interpretation of data from lower coverage genome scans. First, within this outlier region, there is often an imperfect association between SNPs and the presumed adaptive variant. For example, within the region in the Mayer system stickleback, four of the 15 SNPs show weak differentiation between stream–lake habitats (figure 3b). This can happen when mutations accumulate within the outlier region, when recombination breaks down an ancestral region, or when selection acts on multiple ancestral alleles within the pool of standing genetic variation (i.e. a soft-sweep [40]). In the Spence system, we see evidence that recombination has disconnected the link between most SNPs in the Chr4–19.8 Mb region and the adaptive variant (only two SNPs are outliers). If we extrapolate these observations to the other SNPs in our dataset, it is apparent that some SNPs we have characterized may be within, or close to, a genome region under divergent selection, but are no longer diagnostically linked to the adaptive change.

Population-specific outliers are often discounted as potentially erroneous owing to non-repeatability [6,9]. However, in the current study, many of these are strongly supported and this divergence is being maintained despite the potential for gene flow. The occurrence of population-specific outliers makes sense for several reasons. First, ecological pressures are unlikely to be fully parallel, as demonstrated by slightly different morphological divergences seen between habitats in each stream–lake pair (see also [24]). Second, it is possible that for polygenic traits, different loci may be recruited. Third, particular adaptive alleles may be absent in some of the independently colonized watersheds. Finally, as discussed above, linkage between the adaptive variant and a specific outlier SNPs may have broken down in some cases. Given these possibilities, strongly supported population-specific outlier loci should not be discounted in subsequent studies characterizing adaptive genetic variation.

(c) Comparison with outlier loci in other stickleback divergences

In his initial description of the giant Mayer Lake stickleback, Moodie [19] pointed out that they are closer morphologically to the marine stickleback than to typical freshwater populations, but clarified that it was the ‘character complex’ that set this population apart rather than each particular character. This indicates that some characters (and associated genetic loci) distinguishing giant stickleback may be shared with ancestral marine populations and others with derived freshwater populations. We can explore this by comparing the stream–lake outliers we have identified with regions defined as outliers found in previous marine–freshwater [12] and benthic–limnetic comparisons [31].

Hohenlohe et al. [12] used a high-density genome scan to compare two marine and three freshwater Alaskan stickleback populations. They identified several candidate regions differentiating marine–freshwater stickleback and suggested this was a result of co-selection on multiple functionally related genomic regions. Many of these marine–freshwater outlier regions were also tagged by SNPs on our array, and eight out of 22 chromosomal regions we identified as stream–lake outliers are within candidate regions from marine–freshwater study [12] (table 2). This clearly shows that some of the genomic regions under divergent selection between marine and freshwater populations can be broken down and retained in some freshwater populations. These findings will focus the search for functionality of these particular genomic regions to traits that not only differ between marine and freshwater species pairs, but also between these stream–lake sticklebacks (e.g. osmoregulatory genes are unlikely to diverge between adjoining freshwater populations).

Since the phenotypic divergence seen in sympatric benthic–limnetic lake stickleback morphology mirrors that seen in stream versus lake in many ways [36], again we might expect common genomic regions to be involved. Based on a comparison with a genome scan on benthic–limnetic stickleback in three lakes using a similar set of SNP as used in the current study [31], we find seven genome regions that are outliers in at least one stream–lake pair and benthic–limnetic pair (table 2). None of the outlier regions that differentiated all stream–lake populations, or all benthic–limnetic pairs, is shared between studies. In fact, the maximum number of study systems that shared a common outlier region is three (out of the possible six). It may be possible to look for phenotypic commonalities between the stream–lake and benthic–limnetic pairs that do share outliers to suggest the possible underlying causes; however, with relatively low-density genome scans, much uncertainty remains.

(d) Candidate genes and QTLs in stream–lake genomic outlier regions

While having a reference genome sequence does allow outlier loci from genome scans to be connected with particular genomic regions, the identified outlier loci are based on an individual SNP (or small number of linked SNPs) potentially representing a region containing many genes. For example, within the Chr4–19.8 Mb outlier region, there are 15 outliers SNPs covering approximately 100 kb (approx. 19.81–19.90 Mb) in the Mayer and Drizzle systems, and the flanking SNPs are at 19.3 and 21.2 Mb. Therefore, this is potentially a 2 Mb region in which the adaptive variant could be found. In the Spence system, only two outlier SNPs were identified at the end of the region (at 19.89 and 19.90 Mb) narrowing the window to just over 1 Mb. Hohenlohe et al. [12] also identified marine–freshwater differentiation in this part of the genome (Chr4/LG IV Peak 2); a 1.1 Mb area centred at 20 Mb containing 31 protein coding genes (from which they listed two candidate genes possibly related to morphology: Wnt7B and FBLN1). These comparisons illustrate how data from multiple populations can help reduce the size of the candidate regions. However, further dedicated studies with full sequence will be required to examine fine-scale divergence in this region, and around other outliers, before strong candidate genes can be proposed (169 genes within 50 kb of all outlier loci identified are listed in the electronic supplementary material, table S3).

An alternative way to focus the search for genes under divergent selection is through comparison with quantitative trait loci (QTL) studies for known phenotypic differences. This has the advantage of allowing specific phenotypic traits to be linked to the genomic regions under divergent selection. Limitations are that mapping resolution is low in experimental crosses, the same loci may not always be used in different environments, and many interesting phenotypic differences have not yet been studied by QTL mapping. Nonetheless, several studies have been carried out in stickleback, which link markers to morphological traits that differ between the stream–lake sticklebacks. For example, in a cross between benthic and limnetic sticklebacks, Peichel et al. [41] mapped the location of QTLs influencing length of the pelvic spines (marker: Chr8 17.7 Mb) and first dorsal spine (marker: Chr1 18.9 Mb; Chr2 20.2 Mb). Although stream–lake fish in the current study differ in dorsal and pelvic spine lengths, none of the outlier regions we identified map near the previously described QTLs. By contrast, variation in body depth has been linked to a QTL (marker: Stn321) in another cross between marine and freshwater sticklebacks [42], and this marker has been shown to be highly differentiated between a stream–lake stickleback pair using a candidate marker approach [37]. The body depth Stn321 QTL marker is located on Chr7 at approximately 13.66 Mb [42], approximately 450 kb from an SNP, we identified as an outlier in the Mayer and Drizzle systems. It is plausible that these two markers are detecting signals from a common gene controlling body depth, providing a strong candidate region for further characterization.

5. Conclusions

The population genomics approach we used to examine parallel morphological adaptation of stream–lake stickleback allowed us to establish the independent origin of stream and giant lake stickleback, in three geographically proximate watersheds. The majority of genomic outlier regions identified through genome scans were watershed-specific. However, several were shared between watersheds, and interestingly, several of the stream–lake outliers match those previously identified in marine–freshwater and benthic–limnetic comparisons. Further characterization of the shared regions we have identified will clarify whether the same genetic variants are found in all systems, or if the same loci have been altered in different ways. Our results are a first step to delve further into the search for genomic regions involved in morphological differentiation and reproductive isolation between stream and lake sticklebacks in this model system of ecological speciation. The large amount of standing genetic variation present in stickleback may distinguish this species from others that have undergone adaptive radiations, especially those originating from a small number of founders. However, the patterns of genetic changes observed in stickleback are likely to be mirrored in many other species undergoing rapid adaptation to new environments [43]. The divergences we have observed also emphasize the importance of ecological boundaries to differentiation in a broader context, especially since sharp gradients are probably just as widespread in terrestrial ecosystems.


Stickleback sampling followed guidelines for scientific fish collection in British Columbia, Canada (Ministry of Environment permits: SM09-51584 and SM10-62059). Sampling in Naikoon Provincial Park and Drizzle Lake Ecological Reserve were carried out under park use permits: 103171, 103172, 104795 and 104796.

Research supported by NSERC Discovery grant NRC 2354 (TER), grant P50 HG002568 from US National Institutes of Health (D.M.A. and D.M.K.), a NSERC Postdoctoral Fellowship (B.E.D.) and a Stanford Affymetrix Bio-X Graduate Fellowship (Y.F.C.). D.M.K. is an investigator of the Howard Hughes Medical Institute. Shannon Brady and HudsonAlpha staff facilitated laboratory analysis. John Taylor provided laboratory space and useful discussion.

  • Received July 28, 2011.
  • Accepted September 14, 2011.


View Abstract