## Abstract

Linkage disequilibrium (LD) is an association between genetic loci that is typically transient. Here, we identify a previously overlooked cause of stable LD that may be pervasive: sexual antagonism. This form of selection produces unequal allele frequencies in males and females each generation, which upon admixture at fertilization give rise to an excess of haplotypes that couple male-beneficial with male-beneficial and female-beneficial with female-beneficial alleles. Under sexual antagonism, LD is obtained for all recombination frequencies in the absence of epistasis. The extent of LD is highest at low recombination and for stronger selection. We provide a partition of the total LD into distinct components and compare our result for sexual antagonism with Li and Nei's model of LD owing to population subdivision. Given the frequent observation of sexually antagonistic selection in natural populations and the number of traits that are often involved, these results suggest a major contribution of sexual antagonism to genomic structure.

## 1. Introduction

Linkage disequilibrium (LD) is a covariance between genetic loci that develops for various reasons including epistatic selection, drift, mutation and population structure [1]. LD may be transient or permanent, depending on which of these causes is responsible. Most instances of LD are transient because recombination randomly assigns alleles into gametes each generation. Unless the cause of LD is persistent, the association between loci is expected to decline geometrically over successive generations [2].

There are three known causes of stable LD. The first is epistatic interaction among loci [2–4]. Under this scenario, the fitness of an allele or genotype at one locus depends on the state of the second locus. This favours particular allelic combinations, which generate a statistical association between loci. Provided that selection maintains allelic variation at both loci, epistasis ensures that LD persists. Second, if fitness is determined multiplicatively across two overdominant loci that are sufficiently tightly linked, then a polymorphic equilibrium is necessarily accompanied by LD [5]. Third, permanent LD is possible in a metapopulation owing to allele frequency differences between subpopulations [6]. In fact, LD is expected in subdivided populations because linkage equilibrium requires the restrictive condition of equal allele frequencies in all subpopulations at one or both loci. LD persists provided that genetic variation is maintained in at least one subpopulation (or different combinations of alleles are fixed in different subpopulations).

In this paper, we identify sexual antagonism as a previously overlooked cause of permanent LD. Sexually antagonistic selection occurs when males and females have different phenotypic optima [7,8] and causes genotype frequencies in the male and female population to differ from each other after selection. Theoretically, sexual antagonism can maintain polymorphism [9]; empirically, sexually antagonistic selection is common [10] and sexually antagonistic fitness variation is prevalent in natural and laboratory populations [11–17]. Thus, for sexual antagonism, not only are the elements to produce stable LD in place, they are in abundance.

## 2. Model and results

We consider a model of two genetic loci with two alleles segregating at each: *A*_{1} and *A*_{2}, *B*_{1} and *B*_{2}. Let *x*_{i} and *y*_{j} be the frequencies of the *i*th and *j*th haplotypes in eggs and sperm, respectively, such that: *x*_{1}, *y*_{1} are the frequencies of the *A*_{1}*B*_{1} haplotype; *x*_{2}, *y*_{2} the frequencies of the *A*_{1}*B*_{2} haplotype; *x*_{3}, *y*_{3} the frequencies of the *A*_{2}*B*_{1} haplotype; and *x*_{4}, *y*_{4} the frequencies of the *A*_{2}*B*_{2} haplotype. We arrange the haplotype frequencies in eggs and sperm in vectors **x** and **y**, respectively (henceforth, lower-case bold letters represent vectors). Let *p*_{x} and *p*_{y} be the frequency of the *A*_{1} allele in eggs and sperm and *q*_{x} and *q*_{y} be the frequency of the *B*_{1} allele in eggs and sperm, respectively, calculated as *p*_{ξ} = *ξ*_{1} + *ξ*_{2} and *q*_{ξ} = *ξ*_{1} + *ξ*_{3}, with *ξ* ∈ {*x*,*y*}.

Let genotypes *A*_{1}*A*_{1}, *A*_{1}*A*_{2} and *A*_{2}*A*_{2} have fitness *u*_{1f} = 1 − *s*_{f}, *u*_{2f} = 1 − *h*_{f}*s*_{f} and *u*_{3f} =1 in females, and *u*_{1m} = 1, *u*_{2m} = 1 − *h*_{m}*s*_{m} and *u*_{3m} = 1 − *s*_{m} in males, where 0 ≤ *s* ≤ 1. Fitness at the *B* locus, *v*_{l}_{χ} with *χ* ∈ {f, m}, is parametrized in an analogous manner (table 1). For simplicity we assume that the allelic effects are additive at both loci (*h*_{χ} = 1/2), which guarantees opposing directional selection in the two sexes. We assume that the fitness of a zygote results from the product of the fitness values at each locus. Thus, *w*_{ij}_{χ} = *u*_{k}_{χ} *v*_{l}_{χ}, where *k* and *l* ∈ {1, 2, 3}, is the fitness of a zygote of sex *χ* that results from the union of the *i*th egg and *j*th sperm haplotype (table 2). This assumption eliminates multiplicative epistasis within sexes by rendering the contributions of both loci independent from one another [18]. We arrange the fitness values for females and males in matrices **W**_{f} = (*w*_{ij}_{f}) and **W**_{m} = (*w*_{ij}_{m}). Given our assumptions, the number of different fitness values reduces to six (table 2*b*).

The frequency of haplotype *i* in eggs and sperm after one generation is
2.1a
and
2.1b
where and are the mean fitness of females and males, respectively (defined as ), *r* is the recombination rate between loci, *ɛ*_{i} provides the sign of the LD factor (*ɛ*_{i} is equal to 1 when *i* = 1,4 and −1 when *i* = 2,3) and *D*_{t} is the LD in the population.

When the selection regime is the same in both sexes, gametic haplotype frequencies are sex-independent (*x*_{i} = *y*_{i}) and *D*_{t} takes the familiar form *D*_{t} = *x*_{1}*x*_{4} − *x*_{2}*x*_{3}. However, with sex-specific selection, *D*_{t} must be calculated as
2.2

When selection differs between the sexes, gametic haplotype frequencies depend on sex-specific allele frequencies at the two loci and the LD in eggs and sperm: 2.3 where 2.4 [2,17]. The flow of genes through populations is schematized in figure 1.

Substituting equation (2.3) into equation (2.2) and simplifying yield
2.5
where the second term on the right is equal to twice the covariance between *p* and *q*, 2 Cov(*p*,*q*); thus,
2.6
which provides a neat partition of the total LD generated by sexually antagonistic selection into two components, namely the average LD in gametes and the covariance between the allele frequencies at each locus.

Equation (2.6) is similar in form to Nei & Li's [19] eqn (5) (*D*_{T} = 1/2 (*D*_{x} + *D*_{y}) + Cov(*p*,*q*)) for LD in a subdivided population. This formal similarity allows an interesting interpretation of our result. The two sexes, with their differing ecologies, physiologies and selection regimes [20], are effectively distinct subpopulations that freely exchange migrants from one generation to the next. The appendix explores the close analogy between our model and the model of Li & Nei ([6]; see also [19]).

We are interested in LD at a stable polymorphic equilibrium , i.e. . Calculating equilibrium haplotype frequencies analytically, even with our simplifying assumptions, proved too complex. Instead, we carried out numerical analysis of the recursions in equation (2.1) to determine the equilibrium points for different coefficients of selection *s*_{f} and *s*_{m}. When the equilibrium was polymorphic, we calculated the LD value for each pair (*s*_{f}, *s*_{m}) and for different values of recombination, *r*.

Figure 2 presents LD at polymorphic equilibria broken down into its components: and . LD is present at equilibrium for any value of *r* including free recombination (*r* = 1/2) (figure 2*c*). While the LD in gametes is absent at equilibrium when there is free recombination (figure 2*a*), the covariance between allele frequencies remains positive regardless of the recombination rate (figure 2*b*). Therefore, *D*_{t} remains positive.

LD increases with decreasing *r* (figure 2*c*). The covariance between allele frequencies, however, is less sensitive to *r* than the LD in gametes (figure 2*a*,*b*). For low *r*, the LD in gametes is the major contributor to *D*_{t}, whereas the covariance between allele frequencies becomes the major contributor for high *r*. The transition from being the major contributor to being the major contributor happens in the range of values between *r* = 0.1 and *r* = 0.3 (figure 3).

Finally, LD increases with strength of selection (figures 2 and 4*c*). This is not always the case for both of its components. It is always true for the covariance between allele frequencies (figure 4*b*). It is also true for the LD in gametes when the recombination rate is high (*r* > 0.1 in the symmetric example provided in figure 4*a*). When the recombination rate is low (*r* < 0.01 in the symmetric example provided in figure 4*a*), however, LD in gametes initially increases with strength of selection but then decreases at higher selection strengths (figure 4*a*).

## 3. Discussion

The ultimate source of LD in our model is admixture between two gene pools with differing allele frequencies. Sexual antagonism results in a higher frequency of male beneficial alleles in sperm and of female beneficial alleles in eggs. Fertilization admixes these distinct gene pools and thereby generates LD in zygotes while homogenizing allele frequencies between the sexes (figure 1*a*). Whereas one-time admixture between geographically diverged populations produces transient LD that declines through subsequent generations, in our model, admixture between divergent subpopulations is recurrent because sexual antagonism alters the allele frequencies in opposing directions every generation. Therefore, permanent LD is maintained.

Equation (2.6) invites some further interpretation. This equation partitions *D*_{t} into two components: 2 Cov(*p*,*q*) represents the contribution to LD from sex differences in allele frequencies arising from sexually antagonistic selection in the immediately previous generation, whereas 1/2(*D*_{x} + *D*_{y}) represents accumulated LD from earlier generations. LD in gametes accumulates because of the correlated history of alleles at the two loci. When there is free recombination between maternally inherited and paternally inherited haplotypes, there is no correlation in the histories of the alleles at the two loci (whether they were present in the same or different sex) beyond the immediately preceding generation; further, the covariance between loci produced by admixture is erased by one round of free recombination; thus, *D*_{x}, *D*_{y} are expected to be zero at equilibrium. When recombination is less than 50 per cent, the histories of the alleles at the two loci in earlier generations are correlated and some of the LD produced by admixture in zygotes persists into the gametes, both of which make *D*_{x}, *D*_{y} ≠ 0 at equilibrium.

Sex-specific selection does not directly produce LD within a sex. The multiplicative fitness assumption precludes this. For instance, a population of zygotes in linkage equilibrium that then undergoes sexually antagonistic selection produces gametes with zero LD. Sexual antagonism is, however, indirectly responsible for the LD that results after admixture by virtue of its effect on the second term of equation (2.6). This term results from admixing sperm and egg gene pools, which are made unequal by sexual antagonism.

Selection that does not protect polymorphism cannot stabilize LD. Other selective causes of permanent LD considered in §1 (epistatic interactions, multiplicative fitness at two overdominant loci) require special—perhaps extraordinary—circumstances. For instance, fitness variance owing to epistatic interactions exists, but whether epistasis is responsible for the maintenance of this variation is unknown. In order for two overdominant loci to maintain LD, the recombination rate must be less than the marginal segregation load of the population [5], which for selection coefficients under 50 per cent requires that the recombination rate be less than approximately 0.0625. Allele frequency differences among subpopulations are degraded through time by migration unless there is subpopulation-specific selection. The subdivision of populations into two sexes and sexually antagonistic selection are both common [10]. Therefore, the prerequisites for permanent LD under sexual antagonism are present in many natural populations.

However, two theoretical considerations may lessen the occurrence of LD caused by sexual antagonism. The first is the possibility for conflict resolution—i.e. sexual dimorphism [21]. Should a sexually antagonistic locus evolve sex-limited expression, the allele that is favoured in the expressing sex is expected to fix, thus eliminating variation and precluding LD. The time scale over which intralocus conflict operates—and therefore the likelihood of developing permanent LD—is currently unknown: if conflicts are constantly arising but then are quickly (on an evolutionary time scale) resolved by the evolution of dimorphism, the build-up of LD should be minor; if polymorphisms are maintained for longer periods, then LD has more time to develop and should be more widespread. Dating the age of alleles at sexually antagonistic loci could potentially shed light on this question. Such loci have recently been identified by Innocenti & Morrow [22]. The second consideration draws on the work of Turelli & Barton [23]. They showed that at most two loci could remain polymorphic owing to sexually antagonistic selection. Their model makes assumptions that are relaxed in our model. Specifically, Turelli & Barton [23] considered a model in which loosely linked loci determined a single trait subject to weak sexually antagonistic selection. Our model is agnostic about the number of traits subject to sexually antagonistic selection, permits any value of recombination and considers strong as well as weak selection. Future analyses with more than two loci should explore the potential to maintain large amounts of LD.

LD will be created between any two loci that are polymorphic for sexually antagonistic alleles, even for genes that code for different traits. Multi-locus models may introduce still further higher order associations among loci that are not accounted for in our two-locus model [24]. Based on our theoretical results, we expect there to be wide-ranging multi-locus association in genomes that has not yet been examined. Such associations complicate any attempt to find the genes responsible for phenotypes of interest because the standard approach relies on there being a statistical association between the two. With the kind of LD we find accompanying sexual antagonism, we expect that all sexually antagonistic loci show a statistical association with all sexually antagonistic traits.

The division of the population into two sexes that are subject to sex-specific selection is just one way that a gene pool could be subdivided into groups that undergo differential selection. Regular admixture between such subgroups will generate persistent LD [6]. As an example, suppose that a population consists of a mixture of vegetarians and meat-eaters but the two groups sometimes intermarry and children have some independence in the diet they choose to adopt. If the different diets result in differential selection and allele frequency differences between reproductive vegetarians and reproductive meat-eaters, then admixture of these gene pools in their offspring will be a source of LD. This LD arises because of a correlation in the selective forces acting at different loci in the two subpopulations. This hypothetical example suggests that the existence of differential selection at two loci can be a source of persistent LD whenever the selective ‘environments’ at two loci are correlated.

## Acknowledgements

We thank R. C. Lewontin for comments on an early draft of this paper, and Richard Harrison and two anonymous referees for valuble comments.

## APPENDIX A

The discrepancy between our theoretical value of total LD, *D*_{t} = 1/2 (*D*_{x} + *D*_{y}) + 2 Cov(*p*,*q*), and Nei & Li's [19] value, *D*_{T} = 1/2 (*D*_{x} + *D*_{y}) + Cov(*p*,*q*), arises from the way we define LD. Standard two-locus models that lack sex differences obscure the fact that there are two possible choices. One interpretation is that LD is the quantity that must be added or subtracted to the products of allele frequencies to give the correct haplotype frequencies. An example of this can be seen in our equation (2.3). A second interpretation is that LD measures the difference in frequency between the different kinds of double heterozygotes. This difference is evolutionarily relevant because only in double heterozygotes does recombination have a chance to alter haplotype frequencies and break down the association between loci. Not surprisingly, this quantity appears in the recursion equations for haplotype frequencies in our model (equations (2.1) and (2.2)). In a model that lacks sexes or subpopulations, the two interpretations take the same value: *g*_{1}*g*_{4}−*g*_{2}*g*_{3}, where *g*_{i} is the frequency of the *i*th haplotype in the total population. With the population structure that our model and Nei & Li's [19] model introduce, these two interpretations are no longer equivalent. They are, however, mathematically connected as we show below.

Nei & Li [19] average the *i*th haplotype frequencies across subpopulations to obtain the population-wide frequency, . Using the terminology from our model, . They then expand , which produces a value, *D*_{T}, which must be added to the product of population-wide allele frequencies to produce the average haplotype frequencies ([19], eqn (5)):
A 1
Finally, etc.

In our model, we take *D*_{t} to be half the difference between coupling and repulsion double heterozygotes, a quantity that is sitting in equation (2.1):
A 2
(equations (2.2) and (2.6)).

The relation between Nei & Li's [19] *D*_{T} and our *D*_{t} can be seen in the third step of equation (A 1). This can be expressed as:
A 3

The two interpretations of LD, *D*_{T} and *D*_{t}, differ by the magnitude of Cov(*p*,*q*) but essentially capture the same phenomenon. Nei & Li's [19] interpretation sticks closer to what would probably be measured empirically and to the definition set out by the originators of the term:
The equations … imply that any time the gametic frequencies will deviate from the equilibrium frequencies by an amount

*D* which is the product of the coupling gametic frequencies minus that of the repulsion gametic frequencies. *D*, thus defined, may be considered as a measure of *linkage disequilibrium* (italics in original; 2, p. 459).

However, our interpretation better captures the quantity that matters to the evolution of the haplotype frequencies [4] and follows the definition used by Crow & Kimura ([25], p. 197), namely ‘the difference in frequency between the two types of heterozygotes’.

- Received June 4, 2010.
- Accepted September 1, 2010.

- © 2010 The Royal Society