## Abstract

In order for the first genomes to evolve, independent replicators had to act cooperatively, with some reducing their own replication rate to help copy others. It has been argued that limited diffusion explains this early cooperation. However, social evolution models have shown that limited diffusion on its own often does not favour cooperation. Here we model early replicators using social evolution tools. We show that: (i) replicators can be considered to be cooperating as a result of kin selection; (ii) limited diffusion on its own does not favour cooperation; and (iii) the addition of overlapping generations, probably a general trait of molecular replicators, promotes cooperation. These results suggest key life-history features in the evolution of the genome and that the same factors can favour cooperation across the entire tree of life.

## 1. Introduction

Genomes are made up of genes, or replicators, which work together to produce an organism. These genes specialize in different tasks—for example, some produce replication machinery to copy the other genes, while others focus on acquiring energy for this process. However, life began with independent replicators, whose sole purpose was to copy themselves [1–3]. Thus, at some point, before the last universal common ancestor, independent replicators came together to form a rudimentary genome. At least some of these replicators had to focus on the task of replicating others, reducing their own replication rate in the process. Thus, to get from the first replicators to the first genome, independent replicators must have acted cooperatively [2–4]. This poses a problem, because we expect natural selection to favour individuals that selfishly replicate themselves. For example, imagine a mutant replicator that, rather than help copy others, replicated itself instead, and therefore had a higher replication rate than its neighbours. All else being equal, this mutant should prevail. So why would early replicators cooperate?

It has been argued that limited diffusion or dispersal could explain cooperation between early replicators [5–12]. The existence of early replicators on surfaces, such as rocks, would have led to relatively limited diffusion [3,13]. A number of simulation studies have examined this possibility, assuming that replicators exist on connected nodes in a two dimensional lattice. In these lattice models, when replicators replicate, their offspring can move one, two, or many nodes away. How far offspring can travel in a given step determines whether the system has limited diffusion or is well mixed. These simulations have shown that limited diffusion can favour the evolution of cooperative replicators, who help others replicate [5–12]. Limited diffusion keeps cooperators together, and so their cooperation is directed towards other cooperators, allowing them to outcompete non-cooperators.

However, this suggested role of limited diffusion raises two further questions. First, limited dispersal has previously been argued to explain cooperation in organisms ranging from bacteria to birds because it keeps relatives together [14–20]. In these cases, cooperation is favoured because it is directed at relatives who share the same genes, termed kin selection [14]. Can we think of this early cooperation between replicators as being favoured by kin selection, analogous to that in higher organisms? If so, we could make broad generalizations about the factors that have favoured cooperation, across different biological levels, as life on earth evolved.

Second, theoretical kin selection models have shown that limited dispersal, on its own, does not favour cooperation (reviewed by [21,22]). Although limited dispersal increases the likelihood that cooperation can be directed towards relatives, it also increases competition between relatives. Taylor [23,24] showed that in the simplest case, these effects exactly cancel, and that the rate of dispersal does not influence selection for cooperation. Since then, a number of models have shown that limited dispersal can favour cooperation, but only if additional factors are added, such as overlapping generations, or dispersal in groups (buds), which allow the benefits of increased relatedness to outweigh the extra competition (e.g. [25–27]). Consequently, we must ask how limited diffusion manages to favour cooperation in these replicator models.

We develop theoretical models to address these two questions. We focus on a specific example of replicator biology, termed the trans-replicase system, because it is one of the simplest forms of molecular cooperation. First, we develop a simple kin selection model to examine whether we can think of limited diffusion as favouring cooperation in trans-replicases by kin selection [28]. This model allows us to compare the evolution of cooperation in a simple replicator with models developed to explain cooperation across a range of other taxa. Second, we develop a more spatially explicit model, to ask how limited diffusion might favour cooperation among replicators. We develop a relatively simple model to capture the key features of the previous simulations, but where we can obtain an analytical solution [5–12].

## 2. Heuristic overview

We start by developing the simplest possible, heuristic model. The purpose of this is to try to capture the biology of an early replicator using the tools of social evolution theory [29–31]. This kind of streamlined model aims to cut out all but the most essential biological and biochemical details, to capture a wider range of possible biologies, and identify key parameters. We sacrifice realism for generality and insight.

We model one possible system for cooperation in replicators: the trans-acting replicase or trans-replicase (figure 1) [12]. There are a variety of possible biologies for early replicators, but the trans-replicase model is one of the simplest. Trans-replicases are replicating molecules that, upon replicating, through mechanisms such as alternate folding, can express one of two phenotypes: (i) replicases, which are molecules that can copy other replicators, or (ii) templates, which can be copied by a replicase but do not act as replicases (figure 1). The replicase phenotype can be considered to be cooperative, because it reduces its own replication rate in order to increase the replication rate of others, and the template phenotype to be relatively selfish. An individual maintains its phenotype of being either a replicase or a template for life, such that any given individual is either a replicase or a template.

We assume that when a replicator is copied, the new copy (offspring) folds to become a replicase with probability *x* and a template with probability 1 − *x*. Mutations could cause individuals to express the cooperative replicase phenotype with higher or lower probability—creating more or less cooperative strategies. We are envisaging phenotypic variation generated by alternate folding, but our model captures other ways to generate variable phenotypes. Although a trans-replicase capable of self-sustaining in a pre-biotic world has yet to be identified, the development of simple RNA molecules capable of template directed synthesis suggest their plausibility [32–37].

Individual replicators have a baseline replication rate, or fitness, of 1. A replicase experiences a replication rate reduced by *c*, which can be considered the cost of helping or cooperating with templates. If replicases cannot replicate, then *c* = 1. The presence of replicases increase the replication rate of a template by a factor of *by*, where *y* is the average proportion of replicators which are replicases over the scale at which replicators can interact (the social group). This increase in the template replication rate can be considered the benefit of the cooperative or helpful act. When there are a higher fraction of replicases in the social group (*y*), there is a greater likelihood of any template being helped. Replicases do not catalyse the replication of other replicases, and so do not provide a benefit to replicases.

We can write the expected fitness of a focal individual (*w*) as the summed fitness of its replicase (1 − *c*) and template (1 + *by*) offspring, multiplied by their relative frequency, which is *x* and 1 − *x*, respectively, giving
2.1

We are considering an asexual population, and so ignoring mutation, the strategy or phenotype of a replicator and the copies (offspring) that it produces will be the same, *x*. Consequently, equation (2.1) can also be conceptualized as the sum of the likelihood that a replicator developed as a replicase multiplied by its fitness in that scenario, and the likelihood that a replicator developed as a template multiplied by its fitness in that scenario. More traditionally, equation (2.1) is conceptualized as the average reproductive value of the focal offspring [38].

We seek the evolutionary stable strategy (ESS), *x**, which cannot be beaten by any other strategy [39]. Taylor [29] and Frank [31] developed an approach for determining the ESS in social models. Assuming selection is weak, candidate ESSs occur where the derivative, with respect to deviations in *x* (also known as the ‘inclusive fitness effect’) equals zero:
2.2

The d*y*/d*x* term is the slope of the regression of a random partner's phenotype on the focal individual's, and can be replaced with *r* [29], the standard coefficient of relatedness [40–42]. Relatedness, *r*, is a measure of genetic similarity between our focal individual and the other individuals on the patch. In our model, *r* is the likelihood that our focal individual shares the same gene at a given locus with another individual on their patch, relative to a random member of the population. Relatedness can arise for a number of reasons, and *r* represents a summary of all details about the population structure. This kind of approach has proved useful for linking data with theory, because a simple model can then be applied to a variety of different biological cases, where a positive relatedness arises for different reasons [43–47].

Replacing d*y*/d*x* with *r*, we calculate the ESS (*x**) to be
2.3

From the above equation, we can see that increasing relatedness increases the ESS value of *x*. Our model is agnostic to how relatedness between interacting individuals arises and therefore captures a variety of ways through which relatedness could be positive. One way to achieve higher *r* is through limited diffusion, because this leads to identical copies of the gene being more likely to find themselves near each other [14]. Thus, our result captures previous claims that limited diffusion would favour cooperation between replicators. Equation (2.3) is analogous to the ESS identified in Frank's [30] paired suicide model, but with an arbitrary cost of cooperation.

We found that cooperation evolves (*x** > 0) when *rb* − *c* > 0, which is the classic result known as Hamilton's [14] rule. Hamilton's rule is a relatively general result stating that a cooperative trait will evolve if the cost, *c*, is outweighed by the benefit, *b*, weighted by relatedness, *r* [48]. This analysis, therefore, confirms that we can think of kin selection as the reason limited diffusion favours cooperation between replicators, in exactly the same way as kin selection is usually applied to explain cooperation in other taxa, such as bacteria and animals.

## 3. Population structure

### (a) Island model

Our above model showed that high relatedness favours cooperation, but left open the mechanism by which high relatedness is generated. One possibility is through limited diffusion of offspring copies [14], as has been argued for a wide range of organisms, including trans-replicases [12]. We test this idea by explicitly modelling population structure in an infinite island model. This is a standard approach to modelling population structure in evolutionary biology and is slightly different from a lattice model. In a lattice model we explicitly track distance, such that individuals might be further or closer apart. In an island model, we do not track distance, but instead allow individuals to stay in one place or move arbitrarily far away. The island model has been shown to give similar results to more explicit lattice and stepping stone structures [49].

Our infinite population is now subdivided into an infinite number of patches, or islands. For example, we can imagine that groups of replicators are isolated in crevices, on separate rocks, or even in droplets [4,50]. These patches have limited resources (e.g. nucleotides), such that they contain *N* individuals. Individual replicators interact within these patches, and these interactions determine their fitnesses or the number of offspring copies they produce.

Offspring are produced in a single generation, or round, of replication, and offspring copies diffuse to a distant patch with probability (1 − *s*). Otherwise, with probability *s*, they stay on the same patch. Offspring that remain compete randomly among themselves and with new arrivees from other patches for the *N* available spots, and those that do not secure a spot die. Thus, an individual's fitness determines the chances that the next generation is made up of its offspring. Dispersers, similarly, compete on their new patch with other dispersers and residents for the *N* spots on that patch. ‘Dispersal’ and ‘diffusion’ are usually used in the kin selection and replicator literature, respectively, to mean the same thing ((1 − *s*))—we use diffusion for replicators.

Biological models often assume that generations are discrete. This means that when offspring copies are produced, parent copies die, such that each new generation is made up exclusively of offspring. This may not be realistic for simple replicators. Thus, we allow some proportion, *k*, of parent individuals to survive into the next generation. Survivors maintain their spot on a patch (for example, because they are bound to one of the free binding sites). As a result, all offspring individuals are competing for the 1 − *k* fraction of free spots on a patch.

In some ways, the patches in our model are similar to cells, in that they allow associations to build up between individuals. However, they are distinct from cells in that offspring disperse independently. The diffusion rate is fixed in this model, which is justified if it is a function of an extrinsic factor (e.g. displacement by movement of the surrounding water), or a non-varying intrinsic factor (e.g. a chemical bond that is independent of mutation). *k* is fixed for similar reasons.

An individual's fitness now depends on whether it diffuses (with probability (1 − *s*)) or stays (*s*), because this will determine the individual's competitive arena. If an individual disperses, its fitness is proportional to the population average fitness, which we assume to be 1 (the population is neither growing or shrinking). If it stays, its fitness is relative to the average fitness on the patch. To determine the average fitness on the patch, it will be necessary to take into account the average phenotype of the whole patch, including the focal individual, which is equal to (*y*(*N* − 1) + *x*)/*N*, but, which, for simplicity we will denote *Z*. After diffusion, the number of individuals on a patch is equal to the number of individuals produced on a patch that stay (with probability *s*) plus the number of individuals arriving from elsewhere ((1 − *s*)*N*). So the total number of offspring on a patch after diffusion is
3.1These offspring then compete for the available (1 − *k*) fraction of places on the patch. This allows us to write the fitness of an individual in terms of whether it stays or disperses, as a function of *x* and *y* (remember that *Z* is a function of *x* and *y*):
3.2

From this equation, we can use the Taylor–Frank approach to calculate the equilibrium strategy. However, the resulting equation is not analytically tractable. Instead, if we assume *b* and *c* are small, we can write a first-order approximation of this function that can be solved analytically. We also solved the Taylor–Frank equilibrium equation determined from equation (3.2) numerically, and found that relaxing the assumption of small *b* and *c* does not change the results (figures 2 and 3). The first-order approximation is
3.3

The fitness components in this equation have a simple biological interpretation. The terms on the left (inside the square brackets) capture the primary consequences of exhibiting the cooperative behaviour, as in the simpler model (equation (2.1)). Specifically, given an individual is cooperative, it incurs a cost, *c* (−*cx*) and given it is not cooperative, it receives a benefit, *b* from cooperative partners ((1 − *x*)*by*). The terms on the right capture how cooperation leads to an increase in the local competition. Specifically, extra offspring produced by the average of the trait (*Z*) on the patch displace the focal individual, given both the extra offspring and the focal individual remain on the patch (with probability *s*^{2}) ((*bZ*(1 − *Z*) − *cZ*)*s*^{2}). This model is analogous to a haploid asexual model of others-only cooperation like that found in Taylor & Irwin [25]. As replicases cannot receive benefits, we are modelling what has been called a negatively synergistic game [51].

Using the Taylor–Frank approach, we can write the inclusive fitness effect as 3.4

The terms in the first set of square brackets are the direct effects of cooperation and the terms in the second set capture the indirect effects mediated through relatives. *G* = − *c*/*N* + *b*(1 − *Z*)/*N* − *bZ*/*N* and *H* = (*N* − 1)(*c* + *b*(2*Z* − 1))/*N* capture the secondary effects of extra offspring that stay on the natal patch, and are decreasing functions of *Z*. From this, we can calculate the ESS to be
3.5

This gives a solution in terms of relatedness (*r*) and diffusion rate (1 − *s*), but we expect *r* to depend on *s*. Limited diffusion (increasing *s*) should increase relatedness (*r*). Specifically, *r* is determined by the diffusion rate, the survival rate, *k* and the patch size, *N*. We can calculate *r* in terms of these parameter values, and plug this value for *r* back into equation (3.5) (see appendix A for details). This closes the model [23,26] to give the equilibrium value
3.6

### (b) Discrete generations

First, we consider the specific case of discrete generations (*k* = 0), which is the simplest possible case. When generations are discrete, we find that
3.7

This equation shows that, in the case of discrete generations, diffusion has no effect on the ESS value of cooperation—the parameter *s* is not in equation (3.3). Furthermore, that under limited diffusion, cooperation cannot evolve (*c* and *b* are positive, and the direction of selection at *x** = 0 is negative, making pure templates the stable boundary condition). This result echoes the classic result by Taylor [23,24], which showed that, while limited dispersal increases relatedness, this effect is exactly offset by the corresponding increase in local competition. This can be seen in our equation (3.4), by the two ways in which *s* determines the inclusive fitness effect of cooperation. Increasing *s* raises *r*, and therefore increases the indirect benefits gained by cooperating. However, increasing *s* also leads to the losses owing to *H* and *G* (extra offspring on the patch) being more heavily weighted. In the case of *k* = 0, these two effects exactly cancel.

### (c) Overlapping generations

We now consider when there is some degree of overlapping generations (*k* > 0). In this case, the condition for cooperation to evolve becomes
3.8

If *k* and *s* are both greater than zero—that is, if there is some degree of overlapping generations and limited diffusion—cooperation can evolve. This is because increasing *k* raises *r*, relatedness, and therefore increases the indirect benefits of cooperation without increasing the competitive effects of extra offspring (equation (3.4)). Consequently, decreasing diffusion rate (1 − *s*) and increasing survivorship (*k*) tend to favour cooperation (figure 2). Increasing the benefit of cooperation, *b*, and decreasing the cost of cooperation, *c*, make it easier for cooperation to evolve.

Decreasing patch size (*N*) makes it easier for cooperation to evolve. This is because the larger the patch size, the lower the average relatedness on a patch (equation (A 1)). One caveat is that we assume, deterministically, that each patch contains both templates and replicases. As *N* gets smaller, stochastic variation in the patch composition make this less likely to hold. In the extreme, if *N* = 1, then the patch could only contain a template or a replicase, but not both. Consequently, replicases would be paying the cost of cooperation, when there are no templates to gain the benefit. This stochastic effect would be reduced or removed if cooperation is conditional upon being in a patch where there are templates. An analogous problem of stochasticity in small patch sizes has been considered with sex allocation in structured populations (local mate competition) [45].

In our above model, we assumed that survivors maintain their spot on a patch. This is reasonable if, for example, once a replicator finds a binding site it remains there until death. Alternatively, we might allow survivors to remain on the patch but to compete equally with offspring for a place. This would be reasonable if, for example, offspring can ‘bump’ adults from a patch. A third possibility is that survivors can disperse along with offspring, and compete globally—this might occur if during each replication event replicators are dislodged from their binding site. We show in appendix A that neither allowing for survivors to compete for sites nor allowing survivors to disperse qualitatively alters the results, although both changes make cooperation more difficult to evolve.

## 4. Discussion

We have used the analytical tools of social evolution theory to model a simple replicating molecule scenario: transacting replicases. We have shown that cooperation between replicators can be understood as evolving via the process of kin selection through limited diffusion. However, we have also shown that limited diffusion on its own does not favour cooperation (equation (3.7)). Instead, an additional life-history detail of simple replicators is needed—that of overlapping generations (figure 2).

Our social evolution model illustrates two points about replicators. First, we can view limited diffusion as favouring cooperation in simple replicators via kin selection. Consequently, the factor favouring cooperation in trans-replicases: (i) links to a large existing theoretical literature [14,21–23], and (ii) is the same factor that has been previously shown to favour cooperation in a range of organisms including birds, mammals, insects and microbes [15–20,52,53]. By clarifying these links across taxa, we can simplify our understanding of life, rather than having to provide different explanations for different cases. We are not saying cooperation in replicators has to be conceptualized via kin selection, just that it can be useful to do so.

Second, both limited diffusion and overlapping generations are required to favour cooperation. Limited diffusion leads to a build-up of relatedness, which favours cooperation [14]. But at the same time, limited diffusion leads to increased competition between relatives, which disfavours cooperation [23,31,54]. Overall, in the simplest possible scenario, these two effects exactly cancel (equation (3.7)). However, we found that the addition of overlapping generations allows limited diffusion to favour cooperation (figure 2). When generations overlap, this increases relatedness within patches, but without increasing competition between relatives, because offspring still diffuse to the same extent [25]. Specifically, increasing overlap (*k*) raises relatedness ((*r*), and therefore increases the indirect benefits of cooperation without increasing the competitive effects of extra offspring (equation (3.4)). Consequently, when there is both limited diffusion and overlapping generations, the build-up of relatedness outweighs the increased competition between relatives, such that cooperation is favoured (figure 2).

There are many ways to model social behaviours. One decision is whether to assume discrete strategies, such as ‘cooperators’ and ‘cheats’, or to allow for continuous strategies, ranging, for example, from completely selfish to completely cooperative [29,39,55]. The assumption of continuous strategies is clearly valid for animals, where traits are determined by multiple genes, but simple replicators might only have a limited number of strategies open to them by mutation. Another decision is whether to develop explicit simulations or analytical models. Simulations allow greater detail to be incorporated, which can be especially useful when considering specific systems or species. By contrast, the analytical approach usually used in kin selection models tends to be more streamlined, sacrificing details and precision for clarity and generality [29,31]. Further, the kin selection approach offers a heuristic which non-mathematicians can apply across a range of organisms [45,47]. Rather pleasingly, in the replicator scenario examined here, the different approaches make the same qualitative prediction [12].

The route from independent replicators to the first genomes probably involved two kinds of cooperation. Early cooperation could have been between genetic relatives, or replicators of the same type. However, early genomes were probably too simple to copy themselves accurately, and yet inaccurate replication prevented genomes from getting large enough to improve their accuracy [1]. In order for the genome to overcome this ‘error threshold in replication’, it is probable that different types of replicators needed to cooperatively copy each other. Individual replicators could remain small enough to be copied accurately, but the collection of replicators could become large enough to produce the kind of enzyme machinery needed for accuracy [2]. We have modelled the first kind of cooperation—between replicators of the same ‘type’—and have shown that this can be understood as evolving via kin selection [28]. Cooperation between different types, however, would have required some factor to align the interests of unrelated replicators.

To conclude, although we have phrased our model in terms of a specific replicator system, the trans-replicase, we expect our predictions to hold more generally for other types of replicators. We do not yet know the actual biology of the earliest life forms. But, while many higher organisms may adopt a system of discrete generations, we would expect overlapping generations to be a feature of all simple replicators. Our results, then, would apply to various possible routes through which simple replicators could come together to cooperate.

## Data accessibility

This article has no additional data.

## Author's contributions

S.R.L. and S.A.W. contributed to conception, modelling and write up of the manuscript. All authors gave final approval for publication.

## Competing interests

We have no competing interests.

## Funding

S.R.L. is funded by The Clarendon Fund, Hertford College and the Natural Environment Research Council.

## Acknowledgements

We thank Miguel dos Santos, Guy Cooper, Matishalin Patel, Tom Scott, Asher Leeks, Paul Higgs, Peter Taylor, Geoff Wild and one anonymous reviewer for very helpful discussion and/or comments; and Magdalen College for emergency housing. This paper was inspired by Shay *et al.* [12].

## Appendix A

**(a) Writing relatedness in terms of model parameters**

We start by determining the relatedness, at equilibrium, of a focal individual to a random member of its patch, drawn with replacement. This is known as whole-group relatedness (denoted by *R*), because it includes the focal individual, in contrast with others-only relatedness, which does not include the focal individual [42]. Note that in our model, we are dealing with *r*, others-only relatedness, because *y* is the average of the individuals on the patch, excluding the focal individual. We can write *R* (whole-group relatedness), the relatedness between two individuals drawn randomly from a patch with replacement, as the probability that those two individuals are the same individual (1/*N*), and thus have relatedness 1, plus the probability that those two individuals are not the same ((*N* − 1)/*N*), and thus have the relatedness of two random individuals drawn without replacement, or others-only relatedness, *r*:
A 1

Now we take two individuals (without replacement) on the same patch with relatedness *r*, and determine the relatedness of their representatives in the previous generation. With chance *k*^{2} they are both survivors from the previous generation, in which case their relatedness is the same (*r*). With chance 2*k*(1 − *k*) one is a survivor and the other is a new offspring, which is native with probability *s*, in which case their relatedness is *R*. Else, with chance (1 − *k*)^{2} they are both new offspring, are both native with probability *s*^{2}, and thus have relatedness *R*. We can write others-only relatedness between two individuals in the current generation as equal to
A 2

Here *r*_{t} is relatedness in the current generation, or time step, and *r*_{t−1} and *R*_{t−1} are others-only and whole-group relatednesses, respectively, in the previous one. Setting *r*_{t} = *r*_{t−1} we find the equilibrium others-only relatedness. Plugging into equation (A 1), we find the equilibrium value of whole-group relatedness, *R**, to be
A 3

This equation for relatedness was identified by Taylor & Irwin [25]. However, here we are modelling an others only-trait, and thus require others-only relatedness, *r*. *RN* gives us the number of relatives on our patch. Subtracting the focal individual, and dividing by the total number of remaining individuals (*N* − 1), gives us *r**:
A 4Plugging into equation (3.5) gives us equation (3.6).

**(b) Allowing survivors to remain and compete for patch sites or disperse globally**

Our original model assumed that surviving parents maintained their places on a patch, meaning offspring competed for the remaining 1 − *k* fraction of available sites. Here we relax this assumption. First, we allow survivors to remain on their patch, but compete equally with offspring for available sites. Using the relatedness recursion in equation (A 2), we calculate the ESS (assuming small *b* and *c*) to be
A 5

Next, we allow survivors to disperse along with offspring. This requires a new relatedness recursion, which we write as A 6We now determine the ESS to be A 7

If *k* = 0 (discrete generations), both equations (A 5) and (A 7) revert to equation (3.7) in the text, and cooperation cannot evolve. However, given some degree of overlapping generations and limited diffusion, cooperation can evolve, although the condition is now more stringent.

- Received September 1, 2017.
- Accepted September 5, 2017.

- © 2017 The Author(s)

Published by the Royal Society. All rights reserved.