Regional heterogeneity and gene flow maintain variance in a quantitative trait within populations of lodgepole pine

Sam Yeaman, Andy Jarvis

Abstract

Genetic variation is of fundamental importance to biological evolution, yet we still know very little about how it is maintained in nature. Because many species inhabit heterogeneous environments and have pronounced local adaptations, gene flow between differently adapted populations may be a persistent source of genetic variation within populations. If this migration–selection balance is biologically important then there should be strong correlations between genetic variance within populations and the amount of heterogeneity in the environment surrounding them. Here, we use data from a long-term study of 142 populations of lodgepole pine (Pinus contorta) to compare levels of genetic variation in growth response with measures of climatic heterogeneity in the surrounding region. We find that regional heterogeneity explains at least 20% of the variation in genetic variance, suggesting that gene flow and heterogeneous selection may play an important role in maintaining the high levels of genetic variation found within natural populations.

Keywords:

1. Introduction

The maintenance of genetic variation within populations remains a central question in the study of evolutionary biology (Johnson & Barton 2005). Strongly selected traits are expected to have low levels of genetic variance and low heritabilities, due to the purging of non-optimal alleles (Fisher 1930). Empirical evidence, however, shows that variation is ubiquitous. Both natural and experimental populations often have high heritabilities for many traits (Mousseau & Roff 1987; Roff & Mousseau 1987), and traits that are closely associated with fitness tend to have higher levels of genetic variance (but lower heritability) than weakly selected traits (Houle 1992). Understanding the population genetic processes that maintain such high levels of variation is central to the study of evolution.

Numerous theoretical approaches have been taken to address this problem, including models of mutation–selection balance (Lande 1975; Turelli 1984; Bulmer 1989), genotype–environment interactions (Via & Lande 1987; Gillespie & Turelli 1989; Turelli & Barton 2004), population structure and stabilizing selection (Goldstein & Holsinger 1992), temporally fluctuating selection pressures (Waxman & Peck 1999; Burger & Gimelfarb 2002) and pleiotropic overdominance (Bulmer 1973; Gillespie 1984; Turelli & Barton 2004). Despite extensive theoretical research, however, there is still no consensus about which processes are most important and only limited evidence to evaluate this question empirically (Johnson & Barton 2005).

One of the more intuitive mechanisms proposed to maintain genetic variation is through spatially varying selection and gene flow. When populations inhabit environments with different local optima, the reduction in variation caused by selection within each population can be opposed by gene flow between them. As long as gene flow is insufficient to overwhelm local adaptations and prevent divergence among populations, it will increase the standing genetic variance within populations. While numerous models have examined this migration–selection balance in single locus traits (reviewed in Felsenstein 1976), there have been comparatively few treatments of polygenic traits. Slatkin (1978) showed how gene flow through an environmental gradient increases the additive genetic variance within populations inhabiting intermediate regions of the cline, due to the mixing of alleles from populations with different means. Although Slatkin suggested that this effect would only be important under strong selection, Barton (1999) has shown that gene flow through a cline can result in substantial increases in variance under weak selection as well. Other models based on single populations experiencing stabilizing selection and immigration of non-locally adapted alleles (Tufto 2000) or two populations inhabiting different environments connected by gene flow (Spichtig & Kawecki 2004) have made similar qualitative conclusions, showing that gene flow can maintain genetic variation within populations, but that this effect will only occur under limited migration and strong selection.

Although these theoretical treatments have demonstrated the possibility that gene flow and spatially heterogeneous selection can maintain variation within populations, it is unclear whether this effect makes a substantial contribution to levels of variation in nature. Several studies have shown that gene flow in clines can maintain heterozygosity in single locus traits (Barton & Hewitt 1985; Lenormand & Raymond 2000), but there have been fewer demonstrations of this effect in quantitative traits. Sgro & Blows (2003) examined genetic variance in a cline for development time in wild populations of Drosophila serrata and found that non-additive genetic variance was highest at the region with the sharpest change in trait mean. While this evidence conforms to the predictions of Barton's analysis of polygenic clines (1999), it is unclear why patterns of additive genetic variance did not also increase in this transition zone, as predicted by theory. Other studies of quantitative genetic clines have reported ambiguous patterns in genetic variance (van't Land et al. 1999; Magiafoglou et al. 2002) or have focused on changes in mean rather than variance (e.g. James et al. 1995; Hoffmann et al. 2002; Palo et al. 2003). Part of the reason for this lack of evidence may stem from the difficulty of testing predictions based on genetic variance; the theoretical expectations for patterns of genetic structure in polygenic traits change depending upon the strength of selection and the shape of the change in optimum. For example, under a gentle linear gradient, genetic variance should be relatively constant across the cline, whereas for a steep linear gradient, genetic variance should peak at the centre of the cline (Barton 1999). Because the shape of the environmental gradient is often unknown and because similar patterns could be expected under alternative processes such as localized mutation–selection balance, it is difficult to test hypotheses based on clines in polygenic traits. In a slightly different context, empirical studies of maladaptation caused by gene flow (Riechert 1993; Bossart & Scriber 1995; King & Lawson 1995; Hendry & Taylor 2004) suggest that gene flow can be a persistent source of variance within populations, yet there is still considerable disagreement about the strength of this effect (Hendry & Taylor 2004). Thus, while theoretical research suggests that gene flow through heterogeneous environments may maintain genetic variance, there is little empirical evidence to evaluate the importance of this process.

A general prediction about the effect of gene flow that is more easily tested is that genetic variance within populations should be correlated to regional environmental heterogeneity. Assuming homogenous migration rates and dispersal distances across the range of a species, the variance in trait values carried by alleles flowing into any given population should be proportional to the regional variance in trait means, which should in turn be positively related to the regional variance in environment. This prediction is a direct extrapolation from cline theory, as variance is expected to be higher in populations inhabiting steep clines with more heterogeneity within the region defined by dispersal distance (Slatkin 1978; Barton 1999). Importantly, this prediction will only hold when there is sufficient regional variation in environment to maintain local adaptations among populations (analogous to Slatkin's ‘characteristic length’ threshold in a linear cline).

To test this prediction, we studied patterns of genetic variation within populations of lodgepole pine (Pinus contorta) using previously collected data from a long-term common garden experiment (Rehfeldt et al. 1999) and compared these with quantitative estimates of environmental heterogeneity. Lodgepole pine is a long-lived, highly outcrossed conifer (Yeh & Layton 1979) with an extensive range spanning climatically heterogeneous mountain environments and relatively homogenous valleys and plateaus from Baja California to the Yukon Territories and Alaska. Adaptation to the climatic variations across this range requires tradeoffs between maximizing growth rate and tailoring phenology to avoid exposing sensitive growing tissues to extreme environmental conditions (Howe et al. 2003). Like many tree species, neutral genetic variation in lodgepole pine is mostly partitioned within populations (Yang et al. 1996; Hamrick 2004). Studies of growth response, however, have found significant patterns of local adaptation and divergence in trait means among populations (Yang et al. 1996; Rehfeldt et al. 1999; Wu & Ying 2004). Because the environment is sufficiently heterogeneous to maintain divergence between populations in the presence of gene flow, lodgepole pine conforms to the assumptions of the gene flow/environmental heterogeneity hypothesis described above. Thus, if gene flow through heterogeneous environments is an important process maintaining variation, we should see significant correlations between genetic variance for growth response within populations and environmental heterogeneity in the surrounding region. We tested this prediction by comparing levels of genetic variance within populations with quantitative measures of heterogeneity in climatic conditions in their surrounding region.

2. Material and methods

(a) Genetic variance

Genetic variance was estimated using previously collected data from a common garden experiment testing growth response to climate in 142 populations of lodgepole pine. This experiment was established by Keith Illingworth and maintained by the British Columbia Forest Service; the details are described more thoroughly in other papers (Rehfeldt et al. 1999; Wu & Ying 2004). Basic experimental design was as follows: seeds were collected from ca 15 different trees in each population (over an area of ca 1 km2) and bulked according to their population of origin (Alvin Yanchuk 2005, personal communication). Seeds from each population were then planted in several common gardens across a range of climatic conditions, with two randomized blocks of nine individuals in each site (Rehfeldt et al. 1999). Due to the limitations of such an extensive study, populations were not planted in the full set of 60 common gardens used for testing, but were planted in ca 33 different sites on average. Following 20 years of growth, the heights of all surviving individuals were recorded.

For the analysis in this study, we excluded any trials with fewer than 10 surviving individuals (out of 18) to minimize the error in estimating variance in small samples. Similarly, we also excluded populations that had been planted in fewer than five locations. Following these restrictions, 103 populations remained with an average of 28 planting sites per population (s.d.=13.9). For each population, we calculated the mean and variance in 20-year growth response using LSMEANS and VARCOMP in SAS with a restricted maximum-likelihood model to control for block effects in the planting design. These measures of variance showed a weak correlation with the population means (r=0.29, p<0.01, d.f.=97). In attempts to correct for this correlation, we tried three different transformations: log-transforming the raw height data, standardizing the final variances by the population means and standardizing the standard deviations by the population means. Each of these transformations, however, yielded estimates of variance that had stronger, negative correlations with the population means (rlog=−0.68; Embedded Image; rσ/μ=−0.87), so for all subsequent analysis we used the original untransformed measures of variance.

A major limitation facing studies investigating the maintenance of variation is the difficulty of estimating genetic variance for a large number of populations. Here, we use phenotypic variance (VP) measured from multiple common garden trials as a proxy for genetic variance (VG) within each population. While imperfect, this approach may provide a reasonable estimation of genetic variance, assuming that environmental variance (VE) is either positively correlated with VG or is roughly constant among populations planted within the same test environment. Randomized planting design and the large number of replicate test sites across a broad range of environmental conditions (mean=33) should reduce the likelihood of consistent differences in VE due to experimental design and/or genotype×environment interactions.

(b) Environmental heterogeneity

To represent patterns of spatial environmental variation, we constructed geographical information system (GIS)-based maps of precipitation and temperature (table 1). Each of these datasets consists of a square lattice with a resolution of approximately 1 km2 per cell, covering a total area of roughly 4000×4000 km. This spans the entire range of lodgepole pine, providing an approximate picture of the spatial variation in climatic conditions experienced by the species. All variables were created from 1 km GIS-based rasters in the WorldClim global climatic datasets (Hijmans et al. 2004). The original WorldClim dataset consists of maps showing mean monthly estimates of precipitation and minimum and maximum temperature, created by using thin plate splines to interpolate from a global network of weather stations based on distance and altitude. Weather station data typically represent 30 year averages from the period of 1960 to 1990. The WorldClim data do not account for rain shadow effects or other complicated meteorological phenomena but the data should be sufficiently accurate to capture medium- and large-scale climatic patterns and approximate regional heterogeneity. As these climatic variables are continuous representations of the abiotic conditions rather than biologically based models of stress (e.g. incorporating the importance of the transition from freezing to non-freezing temperatures), they are only approximate representations of selection pressure. To increase the biological relevance of these maps and avoid the inclusion of climatic data from uninhabited alpine areas and lowland valleys, we eliminated all areas that are currently non-forested. We identified non-forested areas using maps of coniferous forest from 1992 (Defries et al. 2000), excluding any area with less than 10% cover.

View this table:
Table 1

Climatic variables.

For any given cell in the climatic maps, we calculated heterogeneity in the surrounding region using a weighted measure of variance:Embedded Image(2.1)where i, j are the geographical coordinates of the cells being considered, xij is the environmental condition at cell [i,j], Embedded Image is the weighted mean for the region and mij is the weighting at cell [i,j]. In this study, the region of analysis is a 201×201 km square surrounding the focal cell and the weights represent the relative probability of gene flow from the surrounding cells to the focal cell. In lodgepole pine, gene flow is mainly the product of wind-mediated pollen dispersal, with seeds typically moving only short distances (Ennos 1994). While we could find no quantitative data on pollen dispersal curves in lodgepole pine, studies of other wind-pollinated conifers have generally found that pollen flow decreases with distance and is typically leptokurtic (Schuster & Mitton 2000; Robledo-Arnuncio & Gil 2005). Following Austerlitz et al. (2004), we used a Weibull distribution to calculate the weights for each cell, as this was the best fitting of all distributions they tested and is also flexible in its shape:Embedded Image(2.2)where d is the distance between the focal cell and cell [i,j], α is the scale parameter and β is the shape parameter. For this distribution, α corresponds to the mean distance of gene flow and β describes the degree of leptokurtosis (decreasing values of β result in increasing leptokurtosis). It is important to note that, in this application, the mean distance of gene flow does not correspond exactly to the dispersal distance of pollen, due to the possibility of multi-generational stepping stone migration. As an initial parameter set, we measured heterogeneity under α=0.5 km and β=1 (which corresponds to an exponential distribution with mean=0.5). To examine the effect of different parameters describing the dispersal curve, we applied this method to calculate heterogeneity under a range of values of α (200 m–10 km) and β (0.2–2.0), using an analysis window of 201×201 km. This simple representation of gene flow will fail to account for the influence of physical and phenological barriers to pollen flow and may be less accurate in mountainous areas where differences in flowering time can present effective barriers to gene flow (Schuster et al. 1989); it should, however, provide a coarse approximation.

In all applications of the above methods, we calculated heterogeneity for the cell corresponding to the geo-referenced point of origin for each provenance. Four of the provenance geo-reference points corresponded to cells that lay more than 2 km outside of currently forested areas, and these were excluded from the analysis. Following these restrictions, 99 populations had sufficient data for estimates of both genetic diversity and environmental heterogeneity. All variables except drought were log-transformed prior to the calculation of weighted variance to reduce the effect of large ranges in magnitude. To facilitate comparisons among variables, all measurements of heterogeneity were standardized to an index between 0 and 1 by subtracting the minimum value from all scores and dividing by the range. To evaluate the relationship between variance within populations and regional heterogeneity, we performed multiple linear regressions and tested Pearson correlations using ‘lm’ and ‘cor.test’ in the statistical package R. As a test of an alternative hypothesis, we calculated correlations between genetic variance within populations and both latitude and distance from the centre of the range (defining the centre as the mean of all latitudinal and longitudinal coordinates).

3. Results

Genetic variance in 20-year growth responses within populations was correlated with regional heterogeneity in all four environmental variables (figure 1). Surprisingly, these correlations were relatively insensitive to variations in the parameters defining the dispersal distance under the Weibull distribution (figure 2). Pearson correlations (r) calculated under all parameter combinations were significantly different from 0 (p<0.05, d.f.=97) but 95% confidence limits overlapped for all dispersal functions considered (not shown). Thus, although correlations tended to be somewhat lower under certain parameter combinations (e.g. α<1 and β>1 for drought, cold and temp), these were not statistically different from correlations under other combinations.

Figure 1

Relationship between variance within populations and regional heterogeneity in the four climatic variables (a) drought, (b) precip, (c) cold and (d) temp, calculated using α=500 m, β=1 for the Weibull distribution representing gene flow.

Figure 2

Pearson correlations between variance within populations and heterogeneity in the four climatic variables (a) drought, (b) precip, (c) cold and (d) temp, calculated under a range of values of α and β for the Weibull distribution representing gene flow.

By calculation of r2, regional heterogeneity in drought explained 8–20% of the variation in genetic diversity, depending upon the values of α and β (figure 2). Heterogeneity in the other variables explained 7–14% (cold), 7–13% (temp) and 7–19% (precip) of the variation in genetic diversity within populations. Although some variables explained more of the variation than others and all were significantly different from zero (p≪0.05, d.f.=97), differences between them were not statistically significant (all estimates lie within 95% confidence limits of other estimates). Thus, it was not possible to evaluate the relative importance of the different climatic variables used here. Multiple linear regression of genetic variance on heterogeneity in all four variables showed that they could collectively explain between 17 and 28% of the variation in genetic diversity within populations, depending upon the values of α and β.

To examine other possible explanations for patterns in genetic variance, we used linear regression to test whether variance within populations was correlated to either their latitude or their distance from the geographic centre of their range. Both variables showed significant inverse relationships, with latitude explaining 5% (p<0.05, d.f.=97) and distance explaining 10% of the variation (p<0.01, d.f.=97) in genetic variance. To test whether these two terms covaried with any of the heterogeneity measurements, we calculated the residuals of the single factor regressions of diversity on latitude and distance and then calculated the correlations between these residuals and the measurements of heterogeneity. In almost all cases, the correlations between genetic variance and heterogeneity were still significant, with heterogeneity explaining 9–18% (drought), 4–12% (cold), 6–10% (temp) and 6–19% (precip) of the variation in residuals of the distance regression, and 7–16% (drought), 6–12% (cold), 7–11% (temp) and 4–18% (precip) of the variation in residuals of the latitude regression, depending upon the values of α and β. Of these, all correlations were significant (p<0.05, d.f.=97) except for 6 of the 120 parameter combinations in cold versus distance residuals and 3 of the 120 parameter combinations in precip versus latitude residuals. We then calculated the residuals from the multiple linear regression of diversity on both latitude and distance. Multiple linear regression of these combined residuals on heterogeneity in all four variables showed that they could collectively explain ca 20% of the variation in genetic diversity within populations (p<0.001, d.f.=97; for α=500 m and β=1). Thus, while latitude and distance from the centre of the range are significantly correlated to genetic variance, for the most part, they explain different portions of the variance from regional environmental heterogeneity.

4. Discussion

Evolutionary theory suggests that quantitative genetic variation within populations can be maintained by spatially varying selection and gene flow (Slatkin 1978; Barton 1999; Tufto 2000; Spichtig & Kawecki 2004). Although local adaptation is pervasive (Hedrick et al. 1976; Linhart & Grant 1996), it is unclear whether gene flow is strong enough to maintain the high levels of genetic variation found in nature. Other processes, such as mutation–selection balance or population bottlenecks, could have a greater effect on levels of genetic variation within populations, obscuring any contribution by gene flow. Evidence from this study shows strong correlation between regional heterogeneity and genetic variance in lodgepole pine (r2∼20%), suggesting that gene flow and heterogeneous selection are making significant contributions to levels of genetic variation within populations.

As this inference is based on correlation, however, there are two possible alternative explanations that should be addressed. First, it is possible that temporal heterogeneity is responsible for maintaining diversity within populations (as per Burger & Gimelfarb 2002). If areas that are more spatially heterogeneous are also more temporally heterogeneous, then it would be impossible to evaluate which of these factors is maintaining diversity with the correlation-based approach used here. There is good reason to suspect that temporal and spatial heterogeneity in climate would not be well correlated, because other factors such as continentality and oceanic currents can influence temporal variations in climate without correlation to spatial influences such as altitude. At the present time, however, there are no long-term climatic records available at a sufficiently fine spatial scale, so this alternative hypothesis cannot be conclusively rejected.

As a second explanation, it is possible that environments within populations are also heterogeneous and that variance within populations is the product of micro-environmental adaptation in their immediate environment. If areas that are regionally heterogeneous also tend to be more heterogeneous at this small scale within populations, then correlations with regional heterogeneity could be an artefact, as described above with respect to temporal heterogeneity. Because the datasets we used are interpolated on a coarse 1×1 km scale, they are inappropriate for estimating fine-scale heterogeneity and testing this alternate hypothesis. While local adaptation in Douglas fir (Pseudotsuga menziesii) has been found over changes in altitude of only a few hundred metres (Campbell 1979), sampling for the establishment of common gardens in this experiment was conducted over small areas (ca 1 km2) and care was taken to avoid sampling over obvious sources of environmental variation within a population. Thus, although it was not possible to conclusively test that variance was not being maintained by micro-environmental adaptation within populations, this seems an unlikely explanation for the correlations found in this study.

While theoretically possible, these alternative explanations are less likely than the suggestion that gene flow and regional heterogeneity maintain diversity within populations. Local adaptation has been demonstrated (Yang et al. 1996; Rehfeldt et al. 1999; Wu & Ying 2004), and gene flow in lodgepole pine is extensive (Perry 1978), so some effect of gene flow on levels of variance is expected. Assuming that our interpretation is correct, gene flow and regional heterogeneity explain approximately 20% of the variation in diversity within populations of lodgepole pine. In fact, due to the potential for errors in the heterogeneity modelling and estimation of genetic variance, the true correlation may be considerably stronger.

This evidence suggests that gene flow and heterogeneity play an important role in maintaining genetic variance, but how general is this effect? Under what conditions would we expect to see maintenance of variation by environmental heterogeneity and gene flow in other species? First, natural selection must be strong enough to maintain variation in trait means between populations, in spite of gene flow. Second, for the increase in variance to be significant, there must be considerable spatial variation in the optimum trait within the effective range of gene flow. Linear cline models show how the environment must change over a characteristic length defined by the strength of selection and distance of gene flow in order for selection to maintain localized adaptations (Slatkin 1978; Barton 1999). Assuming that there is some analogous threshold in more complex heterogeneous environments, there will be some scale of change in environment below which local adaptations are not maintained and genetic variance is unaffected by migration–selection balance. Lodgepole pine is an ideal species in which to detect such an effect, as it has extensive gene flow and inhabits both heterogeneous mountain environments and homogenous plateaus. It remains to be seen whether migration–selection balance plays a significant role in other species, where gene flow is limited and/or environments are more uniform. Unfortunately, the method used here is only applicable when there is substantial variation in the regional heterogeneity; tests based on correlation are powerless to detect an effect when all populations experience equal conditions. It is worth noting that if migration–selection balance plays a significant role in driving levels of diversity, the common statistical assumption of homogeneity of variances across populations may not be warranted when they inhabit environments with different levels of heterogeneity.

Most studies of the impact of gene flow on genetic structure within populations have focused on its maladaptive consequences. Empirical studies have described the impacts of migration load (e.g. Storfer & Sih 1998), gene swamping (e.g. Raymond & Marquine 1994) and outbreeding depression (e.g. Price & Waser 1979), while theoretical studies have noted the limits to local adaptation (e.g. Kirkpatrick & Barton 1997). Here, we have found evidence suggesting that variance within populations can be maintained by gene flow without homogenizing local adaptations and eliminating diversity between populations. While populations of lodgepole pine are not always perfectly locally adapted (Wu & Ying 2004), they are able to persist under this genetic load and even maintain ecological dominance. Although genetic load is always maladaptive in environments that do not change over time, genetic variance is necessary for any response to selection (Fisher 1930); thus, genetic load may be beneficial in times of rapid environmental change. While both mutation and migration can increase variance, migrant alleles from neighbouring populations with slightly different environments are more likely to be adaptive to slight temporal variations in environment than random mutations generated de novo. For example, if two populations inhabit wet and dry environments, respectively, and climatic change causes these habitats to reverse their characteristics, allelic variation maintained by gene flow would be inherently adaptive to the novel change, whereas alleles generated by mutation would often be maladaptive. If migration–selection balance proves to be a widespread and significant factor maintaining variation in quantitative traits, it will be important to consider environmental heterogeneity and gene flow when evaluating conservation and management options, especially when considering adaptation to climatic change. Generally speaking, conserving both heterogeneous landscapes and historical levels of gene flow should maintain diversity within and between populations.

Acknowledgments

We wish to thank Mike Whitlock for his comments and encouragement and Gerry Rehfeldt for providing the raw data used for the estimates of genetic variance. Throughout the research process, many more than we can list have offered comments and suggestions, but we extend special thanks to SOWD, the Aitken lab, S. Aitken, S. Cook, M. Slatkin, A. Yanchuk, A. Gerstein, and M. Vellend. This work was funded by strategic funds available in the International Centre for Tropical Agriculture (CIAT), the BMZ-funded Sustainable use of Forest Genetic Resources in the Americas project led by the International Plant Genetic Resources Institute (IPGRI) and an NSERC grant to M. Whitlock.

Footnotes

    • Received December 21, 2005.
    • Accepted January 25, 2006.

References

View Abstract