Despite the enormous economic and ecological importance of marine organisms, the spatial scales of adaptation and biocomplexity remain largely unknown. Yet, the preservation of local stocks that possess adaptive diversity is critical to the long-term maintenance of productive stable fisheries and ecosystems. Here, we document genomic evidence of range-wide adaptive differentiation in a broadcast spawning marine fish, Atlantic cod (Gadus morhua), using a genome survey of single nucleotide polymorphisms. Of 1641 gene-associated polymorphisms examined, 70 (4.2%) tested positive for signatures of selection using a Bayesian approach. We identify a subset of these loci (n = 40) for which allele frequencies show parallel temperature-associated clines (p < 0.001, r2 = 0.89) in the eastern and western north Atlantic. Temperature associations were robust to the statistical removal of geographic distance or latitude effects, and contrasted ‘neutral’ loci, which displayed no temperature association. Allele frequencies at temperature-associated loci were significantly correlated, spanned three linkage groups and several were successfully annotated supporting the involvement of multiple independent genes. Our results are consistent with the evolution and/or selective sweep of multiple genes in response to ocean temperature, and support the possibility of a new conservation paradigm for non-model marine organisms based on genomic approaches to resolving functional and adaptive diversity.
Marine biocomplexity and diversity have been associated with sustainability and persistence of marine fisheries in the face of climatic and anthropogenic threats (Hilborn et al. 2003; Ruzzante et al. 2006; Schindler et al. 2010). Accordingly, a better understanding of the spatial scale of population structuring and adaptation is paramount to successful conservation in the oceans (Hutchings 2004; Sale et al. 2005; Conover et al. 2006). Historically, the prevalence of intraspecific diversity, both neutral and adaptive, in many marine species has been unclear (Hellberg 2009). On the one hand, observations of weak genetic structure (Hauser & Carvalho 2008) and the ubiquity of a pelagic larval stage among marine taxa have fostered the postulate that gene flow is extensive (Levin 2006) and, by extension, that local adaptation is rare (Conover et al. 2006). On the other hand, very large population sizes typical of marine species should favour the predominance of natural selection over genetic drift, suggesting that the potential for adaptive diversity in marine populations requires re-evaluation (Hauser & Carvalho 2008), a need that has been reinforced by recent studies that challenge preconceptions of the spatial scales of gene flow and adaptation in the ocean (Levin 2006; Hutchings et al. 2007; Gaggiotti et al. 2009).
Atlantic cod, Gadus morhua, is historically one of the most exploited and studied marine species; however, the degrees of population (e.g. Fox et al. 2009) and adaptive structuring (Hutchings et al. 2007) remain contentious. Tagging and oceanographic studies indicate a high proportion of resident non-migratory stocks (Robichaud & Rose 2004) as well as discrete spawning locations within single management units (Fox et al. 2009); both observations support a hypothesis of unrecognized diversity. In contrast, studies of neutral microsatellite loci and mtDNA generally indicate weak spatial structuring over large spatial scales (Bentzen et al. 1996; O'Leary et al. 2007; Carr & Marshall 2008) consistent with a significant mismatch between the spatial scale of genetic differentiation and management (Reiss et al. 2009). Nonetheless, several loci have been identified that display elevated divergence or clinal structure (e.g. Sick 1965; Pogson 2001; Case et al. 2005; Nielsen et al. 2006), supporting a role for adaptation in the spatial structuring of cod populations. More recently, both Moen et al. (2008) and Nielsen et al. (2009) used single nucleotide polymorphisms (i.e. SNPs) to examine the presence of local adaptation primarily in the eastern Atlantic, and both identified several potential candidate genes, some of which are associated with ocean climate. In conjunction with molecular studies, common garden experiments suggest adaptive differences in growth and survival in the western Atlantic are also associated with temperature (e.g. Hutchings et al. 2007). Nonetheless, it remains unanswered just how pervasive the influence of selection is across the Atlantic cod genome, and the degree to which ocean temperature acts as a selective and population defining agent.
Here, we apply a population genomics approach that combines a genome-wide survey of genetic polymorphism in Atlantic cod with range-wide sampling. This study encompasses the scope of expected environmental and ecological influences in order to explore the geographical scale and magnitude of locus-specific molecular variation. The inclusion of samples from both sides of the North Atlantic that span a large latitudinal gradient provides natural replication of population response to environmental gradients (i.e. clines in allele frequency) and represents an effective approach to the identification of candidate genes associated with ocean climate (Schmidt et al. 2008). Such genomic-based understanding of adaptive divergence and the influence of selective forces, both natural and anthropogenic, will be of focal importance as future successful management identifies the appropriate spatial scales of conservation (Hilborn et al. 2003; Conover et al. 2006; Reiss et al. 2009).
2. Material and methods
(a) Sample collection and DNA extraction
In total, 14 locations were sampled during the period 1996–2007 (figure 1 and electronic supplementary material, figure S1), spanning the entire North Atlantic. On average, 22 fish (electronic supplementary material, table S1) were collected per location by handlines or bottom trawl. Fin clips or blood samples were taken and immediately placed in 95 per cent ethanol. DNA was extracted following the protocol of Elphinstone et al. (2003) modified to work with a 96-well filter plate and automated on a robotic liquid handling system (Perkin-Elmer). See electronic supplementary material for further details on sample collection and populations sampled.
(b) Expressed sequence tag libraries, SNP identification, annotation and linkage
Expressed sequence tag (EST) library creation and sequencing were performed as part of the Cod Genomics and Broodstock Development Project (CGP) and have been described by Bowman et al. (2007, 2010). Libraries used 884 individuals (Bowman et al. 2010) from Eastern Canada (Bay Bulls, Newfoundland and Cape Sable, Nova Scotia), for which cDNAs were sequenced from the 3′-end (UTR), and contigs were searched for SNPs using Polyphred (Nickerson et al. 1997). A total of 3072 potential biallelic SNPs with a minimum of 4× read coverage and a minor allele frequency greater than 25 per cent were selected for analysis. The minor allele frequency of greater than 0.25 was used to ensure that the putative SNPs were not sequencing errors, and were likely to be suitable for mapping purposes (see Hubert et al. 2010). Genotyping was performed by the Genome Quebec Innovation Center using the GoldenGate assay (Fan et al. 2003). Of the 3072 putative SNPs, 2284 (approx. 74%) were selected following screening, of which 1641 were informative (53%). Further details regarding SNP development, genotyping and linkage mapping are contained in the electronic supplementary material and available in Hubert et al. (2009, 2010) and Bowman et al. (2007, 2010). To ensure that our samples were representative and to estimate the frequency of rare alleles, two (temperature-associated, see below) SNPs (cgpGmo-S866 and cgpGmo-S419) were genotyped in an additional 500 individuals (adults and juveniles) from southern Newfoundland, using the KASpar genotyping system (KBiosciences).
(c) Data analysis
A wide variety of methods have been developed to identify regions of the genome that have been subject to natural selection (Nielsen 2005). We used a Bayesian approach (Beaumont & Balding 2004) that directly estimates the posterior probability of a given locus being under selection by defining two alternative models, one with and one without the effect of selection. The respective posterior probabilities of each model are estimated using a reversible jump Markov chain Monte Carlo approach as implemented in the software Bayescan (Foll & Gaggiotti 2008) and compared using the ratio of the two models (i.e. log Bayes Factor). Because geographic isolation or demographic structure may contribute to false-positives (Foll & Gaggiotti 2008), we ran the analysis for the entire dataset and then assessed population structure using principal coordinate analysis (PCA) with significant outliers removed. The Bayescan analysis was then repeated separately on each of the dominant clusters with all loci. To evaluate the potential for false-positives in tests for selection, simulated datasets were generated using Easypop (Balloux 2001) and analysed using Bayescan (see electronic supplementary material for details). We compared the Bayescan results to an alternate method for outlier identification using a hierarchical island model to generate the distribution of genetic within and among populations as implemented in Arelquin (Excoffier & Lischer 2010).
All SNPs identified as outliers using Bayescan were examined for spatial trends in allele frequency. However, given the natural replication of environmental gradients on either side of the Atlantic, and the opportunity this presents for the identification of climate-associated candidate genes (Schmidt et al. 2008), we use a conservative approach and focus primarily on loci which were identified as outliers and which display parallel clines in allele frequency on either side of the Atlantic. Multiple regression and simple Mantel tests were used to examine environmental and ecological associations with allele frequency (see electronic supplementary material). We have used mean annual environmental values (e.g. temperatures) as proxies for the regimes experienced by each of the populations analysed here. Notwithstanding its inherent uncertainties, this approach is one that has been used repeatedly in studies of Atlantic cod (Brander 1994; Myers et al. 2001; Hutchings et al. 2007). Because associations with environmental variables may be by a result of neutral or chance associations, we accounted for the influence of geography by using both partial Mantel tests and the residuals from the isolation by distance relationship (IBD). Linkage among outliers was examined using estimates of D′ and significance based on Fisher's exact test calculated in TASSEL 3.0 (Bradbury et al. 2007).
To mathematically describe clinal variation, we generated sigmoidal curves of cline data, and calculated clinal width using the inverse of the maximum slope. As allele frequencies were not fixed at 0 and 1 on either side of the cline, clinal width equalled Δp/slope, where Δp is the change in allele frequencies from one side of the cline to the other. We used the gradient model of environmental change (Endler 1977), in which dispersal distance is much smaller than the cline width. See the electronic supplementary material for details.
In total, 1641 informative loci were genotyped. On average 87.2 per cent of loci were polymorphic at each geographical location; however, the Baltic Sea sample was successfully genotyped for only 1405 loci owing to the failure of some assays. Locus-specific FST values were as high as 0.60 for some loci. To test the representative nature of our samples and to examine the frequency of rare alleles, we genotyped 500 individuals of two life-history stages (adult and juvenile young of year) from southern Newfoundland at two loci, cgpGmo-S866 and cgp-GmoS419. The results from these large samples were consistent with those obtained with the smaller samples used in our broad-scale survey: the frequency of rare alleles at cgpGmo-S419 and cgpGmo-S866 were 0.005 and 0.000, respectively.
Tests for selection identified multiple loci potentially experiencing directional selection. PCA with the preliminary outliers removed identified three clusters of individuals, two of which comprised multiple locations (electronic supplementary material, figure S2). Tests for selection using Bayescan were repeated using all loci (n = 1641) but for each side of the Atlantic separately, excluding the land-locked Ogac Lake populations owing to clear isolation (electronic supplementary material, figure S2) and the Baltic sample owing to reduction in the number of SNPs. The analysis identified 4.2 per cent (70) of the SNPs examined, as potentially influenced by directional selection (figure 2a,b), of which 94 per cent were outliers in both the east and west Atlantic. Simulations were used to evaluate the potential for false positives. Across all simulations, the portion of false positives was consistently less than 0.5 per cent. In comparison, the test based on the hierarchical island model identified the majority of outliers identified using Bayescan (90% at α = 0.05, or 75% at α = 0.01; electronic supplementary material, figure S3) including 80–90% of the clinal loci.
Individual examination of the allele frequencies at these 70 loci revealed 40 that displayed parallel clines in allele frequency on both sides of the North Atlantic (figure 3a and electronic supplementary material, figure S4a). In addition to loci displaying parallel clines, a further 14 of the outliers displayed clines in the west but were fixed in the eastern Atlantic (electronic supplementary material, figure S4b). Allele frequencies at the remaining SNPs highlighted the isolation of the Barents Sea sample (electronic supplementary material, figure S4c). The parallel clines revealed regions of clear transition in allele frequency (FST > 0.20), between the Scotian Shelf and Newfoundland in the west and between Ireland and Iceland in the east (figure 3a), where non-outlier loci were only weakly structured (FST ∼ 0.01). Clinal loci displayed no variation (FST ∼ 0.00) across large geographical areas in the north where non-outlier loci were highly structured (FST ∼ 0.10), such as between the isolated landlocked Ogac Lake and the Barents Sea, which are separated by a geographical distance of more than 5000 km (figure 3a and electronic supplementary material, figure S4a). Estimates of linkage among clinal loci based on D′ or Fisher's exact test revealed that most clinal loci were significantly linked (electronic supplementary material, figure S5).
Multiple regression of the allele frequencies at the parallel clinal loci with seven environmental and biological characters (see §2) identified average annual bottom temperature as the dominant explanatory factor (r2 = 0.90, p < 0.001; figure 3b and electronic supplementary material, table S3), which is consistent with a link between this subset of loci and ocean temperature. The association between genetic differentiation at these parallel clinal loci and temperature was further supported by simple Mantel tests (electronic supplementary material, table S4), which indicated a significant correlation (r = 0.84, p < 0.001; electronic supplementary material, table S4). We considered the possibility that the observed association with temperature might be influenced by covariates such as geography or latitude, which reflect colonization history or demography. Average bottom temperature was not associated with latitude directly (r2 = 0.02, p = 0.677), and allele frequency remained strongly associated with temperature even after the geographical effect was removed either by analysing the residuals of the IBD relationship or by using partial Mantel tests. Using the residuals of the IBD relationship, the association with temperature remained significant (r2 = 0.73, p < 0.0001; electronic supplementary material, figure S6a,b). When the geographical distance matrix was held constant with a partial Mantel test, the correlation between genetic differentiation and temperature remained strong and highly significant (r = 0.83, p < 0.001; electronic supplementary material, table S4).
Based on clinal width (Endler 1977) and dispersal distances of 100–1000 km, we estimated that the strength of selection acting on these SNPs (temperature-associated SNPs) in the western Atlantic ranges between 0.00054 and 0.04800 (electronic supplementary material, figure S7). We observed a positive association between pair-wise genetic differentiation (FST) and temperature difference for the clinal/temperature-associated loci (electronic supplementary material, figure S6a). Finally, we noted a temperature-associated decline in the ratio of the estimated gene flow at these loci to the neutral loci to the point that they reached a 90–95% reduction at the maximum temperature differential (electronic supplementary material, figure S6c).
Linkage mapping revealed 23 major linkage groups (see Hubert et al. 2010). Out of the 40, 38 parallel clinal temperature-associated loci were assigned to three linkage groups (figure 4). Multiple temperature-associated SNPs share a linkage group with the haemoglobin β1 gene, where the polymorphisms associated with the HbI-1/1, HbI-1/2 and HbI-2/2 phenotypes were suggested to occur (Andersen et al. 2009; Borza et al. 2009), though they are not tightly linked (figure 4). Genotypes of the temperature-associated loci were strongly correlated (electronic supplementary material, figure S5), resulting in three highly discrete classes of multi-locus genotypes that were easily observed using PCA of just the outlier loci (electronic supplementary material, figure S8) each of which are supported by the three linkage groups.
Despite the enormous economic and ecological importance of marine fish stocks, the appropriate spatial scales for their conservation remain poorly understood (e.g. Reiss et al. 2009). Understanding the geographical scale of population structuring and adaptation is paramount to successful conservation as natural populations face unprecedented rates of environmental change. This study supports previous work hypothesizing that natural selection shapes marine populations on smaller spatial scales than expected, given the high dispersal capability of most marine organisms (e.g. Ruzzante et al. 2006; Gaggiotti et al. 2009). We show that ocean temperature may be a strong evolutionary force across the genomes of cold-blooded organisms by demonstrating multi-gene patterns of temperature-related variation in Atlantic cod that occur in parallel on both sides of the Atlantic. We identify a subset of gene-associated polymorphisms for which allele frequencies show parallel temperature-associated clines in otherwise genetically distinct populations on both sides of the Atlantic and test positive for signatures of selection. These SNPs map to three linkage groups and several were successfully annotated consistent with a hypothesis of multiple independent genes. Taken together, these results suggest that discrete regions of the Atlantic cod genome are subject to directional selection and associated with temperature-related local adaptation.
Previous studies of cod have identified environmental correlations with phenotype or evidence of single molecular loci under selection such as pantophysin (Pan I, Pogson & Fevolden 2003), haemoglobin (haemoglobin βI, Andersen et al. 2009; Borza et al. 2009) and blood type E (Møller 1969) primarily in the eastern Atlantic. Notably, the clinal loci identified here were not associated with either PanI or haemoglobin β1, though several share a linkage group with the haemoglobin β1 gene. SNP clines recently identified by Moen et al. (2008) were similar in nature to PanI clines at four sites in the northeast Atlantic, and these SNPs were associated with several ribosomal proteins and other proteins involved in either immune response or muscle contraction (Moen et al. 2008). In addition, Nielsen et al. (2009) identified several SNPs (n = 8) displaying signatures of directional selection primarily in the eastern Atlantic, none of which overlapped with this study. The annotation of the temperature-associated outliers identified here (n = 7; see electronic supplementary material, table S5) suggests highly differing functions and supports a hypothesis that temperature is influencing a wide range of physiological processes. A significant influence of temperature is expected given the precedent set by previous studies and the regulatory role ambient temperature plays in all metabolic and enzymatic reactions of poikilothermic organisms. Nonetheless, the fact that a large portion of outlier SNPs identified here remain unannotated (electronic supplementary material, table S5) highlights the need for further evaluation of the nature of temperature-associated selection across the cod genome.
Many examples exist demonstrating how the thermal environment inhabited by natural populations may directly impose strong selection pressure and result in temperature-associated adaptations (see Angilletta 2009). In cod, temperature has been directly linked to polymorphism at the haemoglobin gene β1 that influences the binding properties of oxygen (Andersen et al. 2009), resulting in temperature-associated polymorphism clines in the Atlantic (Sick 1965; Brix et al. 1998; Andersen et al. 2009). Moreover, temperature is known to influence geographical variation in traits, such as antifreeze production (Goddard et al. 1999) and early life-history growth rate (Hutchings et al. 2007) in cod. The role of temperature-associated selection as the dominant force in shaping the clines observed here is supported by the positive tests for selection and the strong correlations with ocean climate. Our observation of parallel clines in allele frequencies of the same loci in both the eastern and western Atlantic, despite longstanding historical separation (Bigg et al. 2008; Carr & Marshall 2008), is consistent with the hypothesis of natural selection along a replicated, broad-scale environmental gradient (e.g. Schmidt et al. 2008). Our estimates of selection based on clinal width (0.0005–0.05), though dependent on the magnitude of gene flow, were comparable if not lower than those reported for other natural populations (e.g. Endler 1986).
Admittedly, clinal patterns may be formed either by selection along an environmental gradient, through secondary contact between previously isolated populations, or some combination of these (e.g. lactate dehydrogenase in Fundulus, Schulte 2001). Therefore, an alternate hypothesis for the formation of the temperature-associated clines observed here is that they were formed through secondary contact (see Case et al. 2005). During the last glacial cycle, the geographical distribution of cod was likely fragmented, and restricted to as little as 20 per cent of the present day range during the LGM, particularly in the western Atlantic where suitable shelf habitat became very limited as sea levels dropped (Bigg et al. 2008). Nonetheless, it is clear that cod have been continuously present on both sides of the Atlantic for at least 100 000 years, with estimates of divergence between the east and west Atlantic that predate the last glacial maximum (approx. 100–150 kyr, Bigg et al. 2008; Carr & Marshall 2008). However, there is no evidence to date of replicated mtDNA clines on either side of the Atlantic coincident with our clines (Árnason 2004; Carr & Marshall 2008). Recent cod whole mtDNA genome data spanning the SNP cline in the western Atlantic has revealed no evidence of clinal structure, and the major haplogroups are distributed across the region (S. M. Carr and H. D. Marshall 2009, personal communication). Moreover, our putatively neutral SNPs show little differentiation within the east and west (electronic supplementary material, figure S2). Although it is not possible to rule out an allopatric mechanism in this context, the replicated associations with temperature fields in the presence of strong east–west isolation support a selective/adaptive hypothesis. Moreover, the observation that contemporary geography (latitude or geographical distance) could not explain the clines on either side of the Atlantic supports the hypothesis that our clines and associations with temperature are robust.
The fact that many temperature-associated SNPs were highly correlated and clustered in several linkage groups leave the number of genes involved in question. Nonetheless, their distribution over several linkage groups, and the successful annotation of the genes encompassing several temperature-associated SNPs, two of which are non-synonymous, supports the hypothesis of coevolution of multiple genes in response to ocean temperature. Moreover, the SNPs were derived from sequencing of the 3′-end of cDNAs, and these expressed sequence tags show no sequence similarity to each other with the exception of S1039a and S1039b, which were identified from a single contig. The hypothesis of independent genes thus seems likely. It remains possible however that some of these loci are not independent and are outliers as a result of a selective sweep, where neighbouring loci display significant environment associations that result from close proximity to one advantageous variant (i.e. hitch-hiking). Although this scenario could explain the correlation of alleles and environmental associations at loci within tight clusters such as in LG7 (figure 4), it is unlikely to explain the correlation of alleles at loci in differing linkage groups (electronic supplementary material, figures S5 and S8). On the other hand, the low dispersion of temperature-associated SNPs across the genome is consistent with a tendency for temperature-associated genes to share regions of the genome and a model of divergent selection with gene flow (Nosil et al. 2009).
Because the SNPs in the present study were identified from libraries constructed using DNA from western Atlantic cod, some polymorphisms may post-date the isolation of eastern and western Atlantic cod populations (approx. 100 kyr based on studies of nuclear and mitochondrial DNA markers, Bigg et al. 2008), and therefore ascertainment bias could influence inferences made about other regions (e.g. eastern Atlantic). We did observe lower genetic diversity in the eastern Atlantic, which represents a likely effect of ascertainment bias. Moreover, the observation of several outliers displaying clines in one side of the Atlantic while being fixed on the other may also support the presence of ascertainment issues. However, trans-Atlantic differences in allele frequencies are expected and have been observed previously in mtDNA (Árnason 2004; Bigg et al. 2008; Carr & Marshall 2008) and nuclear DNA studies (Bentzen et al. 1996; Pogson 2001; O'Leary et al. 2007). More critically, the fact that the temperature-associated clines were replicated in the western and eastern Atlantic suggests that ascertainment has not biased our interpretation of temperature-associated SNPs as the major conclusion of this study. Furthermore, the clear clustering of individuals by genotype (based on temperature-associated SNPs; electronic supplementary material, figure S8) rather than locality strongly supports the hypothesis that the temperature-associated SNPs are robust to ascertainment bias. Finally, there is some similarity in the trends observed in the temperature-associated outlier SNPs and elsewhere at the microsatellite Gmo132. Previous studies have noted clinal structure at Gmo132 in the eastern Atlantic (Skarstein et al. 2007) and western Atlantic (Bentzen et al. 1996), which has been attributed to selection (Nielsen et al. 2006). There is also clear overlap in allele frequencies at Gmo132 between samples from the Baltic Sea and Georges Bank (Taggart & Cook 1996) as observed here at our clinal SNPs, despite strong isolation.
The conservation of marine species in the face of global climatic change will be dependent on our understanding of species and ecosystem responses to changing selection pressures. Though it remains unclear of how the nature and strength of temperature-dependent selection will change with shifting marine climates, change is nonetheless expected (Vermeij & Roopnarine 2008). Our results support the hypothesis that some warm water alleles are present and available to selection in cold water environments but at low frequencies. Alternatively, a response to ocean warming may involve a northward expansion of southern populations, noting that there is substantial evidence that marine fish move in response to climatic variation (Rose 2005). The genetic clines identified here at a large number of marker loci provide unprecedented insight into both the geographical and genome scale at which selection shapes the adaptive landscape of poikilothermic marine species. Further examinations of these environmental associations, likely through common garden experiments, are required to evaluate the associations observed and the mechanisms involved. Nonetheless, these results support the possibility of a new management and conservation paradigm based on functional diversity in marine systems, as has been established for Pacific salmonids (e.g. Waples 1995; Allendorf et al. 1997). Genomic approaches to resolving the spatial scale of functional and adaptive diversity could inform spatial management strategies, and reduce demographic and fisheries variability through the so-called portfolio effects (Hilborn et al. 2003; Schindler et al. 2010).
The authors thank all who assisted tissue collection. L. Barrett, M. Foll, N. Barton, assisted and advised on data analysis and interpretation. A. Hendry and D. Heath provided comments on an earlier version of this manuscript. I.R.B. was supported by a National Sciences Engineering Research Council (NSERC) PDF. Research was supported in part by Genome Canada, Genome Atlantic and the Atlantic Canada Opportunities Agency through the Atlantic Cod Genomics and Broodstock Development Project. A complete list of supporting partners can be found at www.codgene.ca/partners.php. Research funding and support was also provided by a NSERC Strategic Grant on Connectivity in Marine Fish to P.S. and NSERC Discovery Grants to P.B. and S.B.
↵† Present address: Dept. of Biological Sciences, University of Windsor, Windsor, Ontario, Canada N9B 3P4.
- Received May 8, 2010.
- Accepted June 9, 2010.
- © 2010 The Royal Society