## Abstract

In the context of the finitely repeated Prisoner's Dilemma with the possibility of cooperating or defecting each time, the strategy tit-for-tat (TFT) consists in cooperating the first time and copying the strategy previously used by the opponent the next times. Assuming random pairwise interactions in a finite population of always defecting individuals, TFT can be favoured by selection to go to fixation following its introduction as a mutant strategy. We deduce the condition for this to be the case under weak selection in the framework of a general reproduction scheme in discrete time. In fact, we show when and why the one-third rule for the evolution of cooperation holds, and how it extends to a more general rule. The condition turns out to be more stringent when the numbers of descendants left by the individuals from one time-step to the next may substantially differ. This suggests that the evolution of cooperation is made more difficult in populations with a highly skewed distribution of family size. This is illustrated by two examples.

## 1. Introduction

Nowak *et al*. (2004) have specified the conditions required for natural selection to favour the emergence of cooperation in a finite population from a game-theoretic perspective (Maynard Smith & Price 1973; Maynard Smith 1974). In the Prisoner's Dilemma formulation of the problem (Trivers 1971; Axelrod & Hamilton 1981; Axelrod 1984), two players win if they both cooperate or lose if they both defect, while a defector wins more against a cooperator and a cooperator loses more against a defector. If the players are chosen at random in a large population of cooperators or defectors, then the mean pay-off to a defector is always larger than the mean pay-off to a cooperator whatever the frequency of the cooperators is. If the game between the two players is repeated a given number of times, then sequential strategies, such as cooperating the first time and then doing what the opponent did the previous time (tit-for-tat, TFT) or always defecting (AllD), are possible. With only these two pure strategies in use in the population and enough repetitions of the game, the mean pay-off to TFT is larger than the mean pay-off to AllD if and only if the frequency of TFT in the population exceeds some threshold value *x*^{*}, 0<*x*^{*}<1. This value corresponds to an unstable equilibrium for the replicator dynamics with the pay-off used as an additive change in fitness (e.g. Hofbauer & Sigmund 1998, ch. 7). According to this scheme, the frequency of TFT should go to fixation, and then, as a result, every individual in the population will cooperate, but only if the initial frequency of TFT is larger than *x*^{*}. On the other hand, the strategy TFT should go to extinction if its initial frequency is less than *x*^{*}. Even though *x*^{*} decreases to 0 as the number of repetitions of the game increases, this is an important barrier for the evolution of cooperation in natural populations from the first time it appears as a mutant strategy.

In a finite population, however, random drift can lead TFT to fixation whatever its initial frequency is. Assuming a discrete-time population of fixed size *N*, with one individual replaced at a time according to the Moran model (Moran 1958) and all individuals using AllD initially, but one using TFT, Nowak *et al*. (2004) have shown that selection favours TFT replacing AllD in a sufficiently large population and for sufficiently weak selection if *x*^{*}<1/3. This has been called the one-third law. Under this condition, the probability of fixation of TFT is larger than its initial frequency, 1/*N*, which would be the probability of fixation in the absence of selection. The result has been shown to hold also for a population with discrete non-overlapping generations, which follows the Wright–Fisher model (Fisher 1930; Wright 1931) under the same assumptions on population size and selection intensity (Lessard 2005; Imhof & Nowak 2006).

In this paper, we consider a more general model of reproduction and we show that the one-third law must be extended to take into account the possibility of a highly skewed distribution of family size as may be common in plants, fungi and marine organisms (e.g. Eldon & Wakeley 2006, and references therein). In such a case, not only an exact expression for the probability of fixation may be out of reach, but also diffusion approximations may not be available (Möhle 2001). We will resort to a direct approach for Markov chains to compute the first-order effect of selection on the probability of fixation of a single mutant. The method has been used by Rousset (2003) in a context of kin selection in subdivided populations with small differences in phenotypic values between individuals (Rousset & Billiard 2000). In this context, the first-order effect of selection can be expressed in terms of expected coalescence times under neutrality for pairs of individuals. In our context of a linear game with two pure strategies, the expected coalescence times for up to three individuals will have to be considered. Then, we will see that the one-third law holds only in the domain of application of Kingman's (1982) coalescent (Möhle 2000), and that the condition for TFT to be favoured by weak selection is generally more stringent.

## 2. Model

Consider two strategies, A and B, with the 2×2 game matrix(2.1)where *a* and *b* represent the pay-offs to A, and *c* and *d* the pay-offs to B, in interaction with A and B, respectively. Assume *a*>*c* and *d*>*b*, which means that A and B are the best replies to themselves, or evolutionarily stable strategies in Maynard Smith & Price's (1973) terminology.

Suppose random pairwise interactions in a haploid population of constant size *N*. If the numbers of A and B players in the population are *i* and *N*−*i*, respectively, then the mean pay-offs to A and B are(2.2)and(2.3)respectively. Assume that these pay-offs have additive effects on the fitnesses of A and B, which are expressed as(2.4)and(2.5)respectively. These fitnesses measure the relative success in reproduction. The parameter *s* stands for the intensity of selection and it is assumed to be positive and small. The case *s*=0 corresponds to neutrality.

Time is discrete and an expected fraction *γ* of the population is replaced from one time-step to the next. The case *γ*=1 corresponds to non-overlapping generations as in the Wright–Fisher model (Fisher 1930; Wright 1931). At the other extreme, we have *γ*=1/*N* in the case of single birth–death events as in the Moran model (Moran 1958).

Let the frequency of A at a given time-step be *x*=*i*/*N*. At the next time-step, this frequency will have an expected value *x* in the fraction of the population that is not replaced and an expected value in the fraction that is replaced, where(2.6)is the mean fitness. Then, the change in frequency of A, denoted by Δ*x*, will have a conditional expected value(2.7)Note that the whole conditional distribution of Δ*x* depends not only on the selection pressure but also on the reproduction scheme.

## 3. Fixation probability under weak selection

Suppose that A is a mutant strategy represented once at time *t*=0 and let *x*_{t} be the frequency of A at time *t*=0, 1, 2, …. As a result of the combined effects of selection and drift, the frequency of A will converge to a random variable *x*_{∞} which will take the value 1 with some probability *p*(*s*), and 0 with the complementary probability 1−*p*(*s*), where *p*(*s*) is the probability of fixation of A as a function of the intensity of selection. Note that *p*(0)=1/*N*, which is the same as *x*_{0}, the initial frequency of A. As a matter of fact, one of the individuals at *t*=0 will be the ancestor of the whole population in the long run and, if no selection takes place, it will be one individual chosen at random at *t*=0 by symmetry.

Following Rousset (2003), we write the limit frequency of A in the population as(3.1)where(3.2)is the change in frequency of A from time *t* to time *t*+1. Taking the expectation on both sides of the equality and using the fact that the expectation of a conditional expectation is the expected value, we get(3.3)For *s* small enough, equation (2.7) leads to the approximation(3.4)where(3.5)and(3.6)Then, equation (3.3) yields as approximation for the probability of fixation(3.7)where(3.8)and(3.9)Here, *E*_{0} is used for the expected value in the neutral model, which differs only by terms of order *s* from the expected value *E*_{s} in the selection model.

The expected value of *x*_{t} under neutrality is simply the initial frequency of A, i.e.(3.10)On the other hand, the expected value of corresponds to the probability for two individuals chosen at random with replacement in the population at time *t* to be both of type A. With probability 1/*N*, the individuals are the same and they will be of type A with probability 1/*N* under neutrality for the same reason as described previously. With probability 1−(1/*N*), the individuals are different and they will be of type A, if they have a common ancestor at time *t*=0 and if this ancestor is of type A. The probability of the former event is the probability for the coalescence time of two lineages, denoted by *t*_{2}, to be less or equal to *t*, while the probability of the latter event under neutrality is 1/*N*. Conditioning on the number of different individuals and using the identity *P*_{0}(*t*_{2}≤*t*)=1−*P*_{0}(*t*_{2}>*t*), where *P*_{0} is used for the probability under neutrality, we find that(3.11)Similarly, the expected value of corresponds to the probability for three individuals chosen at random with replacement in the population at time *t* to be all of type A. These will be 1, 2 or 3 different individuals with probability 1/*N*^{2}, (3/*N*)(1−(1/*N*)) or (1−(1/*N*))(1−(2/*N*)), respectively, and they will be of type A under neutrality with probability 1/*N* times the probability for the coalescence time of one, two or three lineages, represented by *t*_{1}=0, *t*_{2} or *t*_{3}, respectively, to be less or equal to *t* under neutrality. This leads to(3.12)Using the fact that , for *k*=1 and 2, we deduce easily that(3.13)and(3.14)It remains to calculate the expected values of *t*_{2} and *t*_{3} under neutrality.

## 4. Expected coalescence times under neutrality

Let *p*_{ij} be the probability under neutrality that *i* individuals chosen at random without replacement at time *t*+1 come from *j* ancestors at time *t*. Of course, these probabilities for *j*=1, …, *i* sum up to 1 for every *i*. The expected time back it takes under neutrality to find the most recent common ancestor to two individuals is(4.1)On the other hand, conditioning on the first time back that the number of ancestors diminishes, the most recent common ancestor to three individuals under neutrality will be found after an expected time(4.2)Note that we have the following relationship(4.3)

On the left-hand side, we have the probability that at least two individuals among three at time *t*+1 have a common ancestor at time *t*. Labelling the three individuals with 1, 2 and 3, this is the probability that 1 and 2, or 1 and 3, or 2 and 3 have a common ancestor. The probability of each of these events is *p*_{21}, while the probability of the intersection of any two is *p*_{31}, which is also the probability of the intersection of all the three. The expression on the right-hand side follows from a standard inclusion–exclusion argument for the probability of the union of three events.

Some algebraic manipulations lead to(4.4)This is the expected value of *t*_{3} under neutrality with *E*_{0}(*t*_{2}) taken as the unit of time. Denoting this variable by *τ*_{3}, we have(4.5)where(4.6)is the probability under neutrality for the number of ancestors to three individuals to be two, the first time that this number diminishes backward in time. Note that, with the same time-scale, the coalescence time *τ*_{2} for two lineages has expected value *E*_{0}(*τ*_{2})=1. This time-scale is customary in exchangeable population models as the Cannings (1974) neutral model, which includes both the Wright–Fisher model and the Moran model under neutrality (Möhle 2004).

## 5. Replacement of strategies

Following Nowak *et al*. (2004), selection favours the mutant strategy A replacing the resident strategy B if the probability of fixation of A exceeds its initial frequency, i.e. *p*(*s*)>1/*N* for *s*>0. If selection is weak enough, this will be the case if and only if *a*_{0}*d*_{0}+*a*_{1}*d*_{1}>0. Since *a*_{1}>0 under our assumptions and *d*_{0}>0 by definition, the condition is equivalent to(5.1)This becomes(5.2)Then, if the population size is large enough, this inequality reduces to(5.3)where(5.4)corresponds to the unstable equilibrium frequency of the pure strategy A for the replicator dynamics in an infinite population (e.g. Hofbauer & Sigmund 1998, ch. 7), and is the conditional probability *q*_{32} evaluated in the limit of a large population size. This probability may be strictly less than 1 if some individuals leave substantially more descendants than others from one time-step to the next.

## 6. Examples

Eldon & Wakeley (2006) have extended both the Moran model and the Wright–Fisher model to allow for a highly skewed distribution of family size as observed in some marine organisms. This mode of reproduction might also be important in social interactions, if there are opinion leaders in small number who distribute their views with high enough efficiency.

A modified neutral Moran model assumes that, at each time-step, an individual chosen at random in the population of size *N* produces either *Nψ*−1 offspring with a probability *N*^{−β} or one offspring with the complementary probability 1−*N*^{−β}. Moreover, these offspring replace the same number of individuals in the population but the parent. Note that the expected fraction of the population replaced is(6.1)It is assumed that 2/*N*≤*ψ*≤1 and *β*≥0. The case *ψ*=2/*N* corresponds to the standard Moran model (Moran 1958).

Under the above assumptions, we have(6.2)and(6.3)The limit of *p*_{32}/(*p*_{31}+*p*_{32}) as *N* goes to infinity takes the value if *β*>2, but the value(6.4)if *β*<2. This value decreases from 1 to 0 as *ψ* goes from 0 to 1. In the critical case *β*=2, we have(6.5)which decreases from 1 to 6/7 as *ψ* goes from 0 to 1.

A similar neutral model with non-overlapping generations, in which case *γ*=1, assumes that, at each generation with probability *N*^{−α}, a single individual chosen at random in the population has a probability *ψ* of being the parent of each individual in the next generation compared with (1−*ψ*)/(*N*−1) for each of the other individuals. Otherwise, this probability is 1/*N* for every individual. This is a modified Wright–Fisher model, which reduces to the standard Wright–Fisher model (Fisher 1930; Wright 1931) when *ψ*=1/*N*. With these assumptions, we get(6.6)and(6.7)The previous result holds with *β*=*α*+1 if *α*≠1, while(6.8)in the critical case *α*=1.

## 7. Discussion

We have considered a linear game in a finite population with a fraction of the population replaced at discrete time-steps, which allows for any distribution of family size from one step to the next. We have shown that the one-third law proposed by Nowak *et al*. (2004) holds in the limit of large population size, when the number of ancestors to three individuals diminishes by one with certainty the first time it diminishes backward in time, i.e. . Simple algebraic manipulations lead to(7.1)where(7.2)Therefore, the one-third law holds when *ϕ*_{1}(3)=0. This is exactly the domain of application of the Kingman coalescent (Möhle 2000) as well as of the Wright–Fisher diffusion (Möhle 2001). In fact, the one-third law comes up when only two lineages can coalesce at a time, and the coalescence rate for three lineages is three times the coalescence rate for two lineages.

In general, weak selection will favour a mutant strategy that is the best reply to itself replacing a resident strategy that is also the best reply to itself if the domain of attraction of the mutant strategy in the replicator dynamics for an infinite population starts at a frequency . When , which may occur with a highly skewed distribution of family size, the condition on the mutant strategy becomes more stringent. In the case of TFT against AllD in the repeated Prisoner's Dilemma, this means more repetitions of the game.

The direct approach used in this paper to find the first-order effect of frequency-dependent selection on the probability of fixation of a single mutant in a finite population is of general validity as long as the fitness functions are linear with respect to the frequency of the mutant as occurs with random pairwise interactions. The same approach was used by Rousset (2003) in a context formally equivalent to constant fitness functions applied to kin selection in subdivided populations (Rousset & Billiard 2000). Moreover, it can be extended to fitness functions in a polynomial form of any degree *k*−1 that would come into play, e.g. with random groups of *k* interacting individuals. Then, the expected coalescence times for up to *k*+1 individuals would have to be considered (Lessard & Ladret 2007). The approach is an alternative and a complement to diffusion approximations (Lessard 2005, and references therein) or more sophisticated tools of stochastic calculus (Lambert 2006).

Finally, the concepts of adaptive dynamics that are used to study long-term evolution of cooperation (Doebeli *et al*. 2004; Brännstrom & Dieckmann 2005; Hauert *et al*. 2006), such as evolutionarily stable strategy (Maynard Smith & Price 1973), continuously stable strategy (Eshel & Motro 1981), polymorphic evolutionarily attractive state trait or evolutionary branching singular point (Christiansen 1991; Metz *et al*. 1996), are based on a payment function to a mutant in a resident population. For an infinite population, the payment function is usually defined as the growth rate of the mutant when it is rare (Lessard 1990) and it corresponds to an invasion fitness. For a finite population, the payment can be defined as the probability of fixation of the mutant when it is represented once in the population (Rousset & Billiard 2000; Proulx & Day 2001; Nowak *et al*. 2004; Lessard 2005), and this corresponds to a replacement fitness. Both measures have been shown to be equivalent for evolutionary stability concepts in the context of a 2×2 matrix game in a finite population with mixed strategies allowed (Wild & Taylor 2004), but this might not be the case in general.

## Acknowledgments

This research was supported in part by the Natural Sciences and Engineering Research Council of Canada. This work was done in Vienna in December 2006 during the workshop ‘Causes of Ecological and Genetic Diversity’ organized by the Vienna Science and Technology Fund and the Erwin Schrödinger International Institute for Mathematical Physics.

## Footnotes

- Received March 15, 2007.
- Accepted April 27, 2007.

- © 2007 The Royal Society