## Abstract

Understanding the emergence and evolution of multicellularity and cellular differentiation is a core problem in biology. We develop a quantitative model that shows that a multicellular form emerges from genetically identical unicellular ancestors when the compartmentalization of poorly compatible physiological processes into component cells of an aggregate produces a fitness advantage. This division of labour between the cells in the aggregate occurs spontaneously at the regulatory level owing to mechanisms present in unicellular ancestors and does not require any genetic predisposition for a particular role in the aggregate or any orchestrated cooperative behaviour of aggregate cells. Mathematically, aggregation implies an increase in the dimensionality of phenotype space that generates a fitness landscape with new fitness maxima, in which the unicellular states of optimized metabolism become fitness saddle points. Evolution of multicellularity is modelled as evolution of a hereditary parameter: the propensity of cells to stick together, which determines the fraction of time a cell spends in the aggregate form. Stickiness can increase evolutionarily owing to the fitness advantage generated by the division of labour between cells in an aggregate.

## 1. Introduction

Life on Earth takes a myriad of different forms, and understanding the evolution of this complexity is one of the core problems in all of science. The origin of species and the diversity of ecosystems are paradigmatic representatives of evolving complexity, but similarly fundamental questions arise when studying the evolution of multi-cellularity and cell differentiation. The evolutionary transition from unicellular to multicellular organisms is often referred to as one of the major transitions in evolution [1], even though many of the requirements for multicellularity probably evolved in unicellular ancestors, thus facilitating the transition [2]. The evolution of multi-cellularity is characterized by the integration of lower-level units into higher-level entities, and hence is associated with a transition in individuality [3–6]. Such transitions are thought to be based on cooperation between the lower-level units [1,3,6], and recent models for the evolution of multicellularity are based on the concept of division of labour [7–9], typically between soma and germ cells [6]. However, the existing models and explanations for the emergence of multicellularity provide only partial answers and raise further questions. In most models, some basic and pre-existing differentiation is assumed, and the circumstances under which such differentiation can be enhanced and stably maintained are investigated. To study the evolution of cell differentiation, it is also often assumed that undifferentiated cells already occur in multicellular aggregates [7], facilitated by selection on size owing to environmental pressures [10], such as predation [11,12] or the need for cooperation [13].

In this paper, we consider the simultaneous evolution of multicellularity and cell differentiation in a population of identical and undifferentiated unicells, based on the idea that the emergence of multicellularity and subsequent cellular specialization are driven by the fitness advantages of a division of labour between cells. Such a division of labour need not necessarily occur in the form of soma and germ cells. Even simple, unicellular organisms need to perform physiological tasks that cannot be efficiently accomplished simultaneously by the same cell. Examples include biochemical incompatibility between metabolic processes (such as between oxygenic photosynthesis and oxygen-sensitive nitrogen fixation in cyanobacteria [8,14–16]), motility and mitosis (processes that compete for the use of the same cellular machinery, the microtubule-organizing centre [17]), and, in general, reproduction and survival in a challenging environment [18–20]. Many unicellular organisms have overcome this problem by temporal segregation of incompatible activities, essentially cycling between phases dedicated solely to a single activity. These cycles can be regulated by endogenous rhythmic mechanisms, as well as by external signals [14,15]. Other cells found alternative means of limiting the detrimental effects of such incompatibility, such as introducing intracellular segregation, or limiting one activity to the minimum necessary for survival, or producing additional substances that chemically prevent the harmful interactions.

In a multicellular organism, such incompatible processes can take place simultaneously, but compartmentalized into separate cells. A first well-studied example of emerging intercellular separation of poorly compatible activities is the germ–soma specialization in *Volvox* [18,19]. Somatic cells gather nutrients from the environment, and provide germ cells with these nutrients [11,21,22]. The somatic cells are flagellated, and the flagella are important for motility and transport of nutrients to the cells [22,23]. Flagellation and cell division are incompatible, and this fact is probably one of the factors promoting differentiation between somatic and germ cells [21].

A second well-studied example is the incompatibility between photosynthesis and nitrogen fixation, and the resolution of this incompatibility in filamentous cyanobacteria [8,14–16]. The key enzyme for nitrogen fixation, nitrogenase, is sensitive to oxygen, and is thus inhibited by oxygenic photosynthesis [15]. In filamentous cyanobacteria, this conflict is resolved by a spatial segregation of the two processes. A small proportion of the cells differentiate into heterocysts that fix nitrogen and do not engage in photosynthesis [24], and these heterocysts exchange metabolites with the vegetative cells in the same filament.

The fitness advantage of such a division of labour is an important, if not crucial, factor in the emergence and evolution of multicellularity and cell specialization. In fact, the unicellular ancestors often already possess the prototypes of regulatory mechanisms that are needed to maintain cell specialization in multicellular forms. Consider an example of two incompatible processes, *A* and *B*, that are alternating in time in a single-cell organism, and assume that the cell has developed a regulatory mechanism that allows it to suppress the process *A* when the contrasting process *B* occurs. When two or more such cells come into a sufficiently close and long enough contact that allows them to exchange the benefits produced by these two processes, it may become more beneficial to end cycling in each cell and come to a steady state with one cell specialized in *A* and the other in *B*. At the basic cellular-signalling level, the endogenous mechanism that drives unicellular cycling is often based on accumulation of the products of *A* or *B* during the active phase and subsequent depletion during the passive phase [14,18,19]. Hence, when a partner cell keeps producing product *A*, the cell that produces *B* does not experience a shortage of product of *A* that may bring the phase-changing mechanism to a halt. This principle can equally be applied to germ–soma specialization where *A* and *B* can be interpreted as reproduction and motility: reproductive cells may not run out of nutrients if they are repositioned by the soma cells of the colony to new feeding positions, while the soma cells do not die out as the reproductive cells keep on producing their genetically identical copies.

The assumption that incompatible cellular processes suppress each other is supported by empirical evidence. Experimental work with *Volvox carteri* has shown that cells that are destined to become reproductive suppress the expression of genes encoding somatic functions, and somatic cells suppress germ cell functions [25,26]. Also, gene-regulatory mechanisms in unicellular ancestors can readily be co-opted during the transition to multi-cellularity, and contribute to differential gene expression in somatic and germ cells [19].

Even if a unicellular form were evolved to combine incompatible processes via other mechanisms (e.g. through the production of costly means of mediation of harmful interactions between metabolites), then cells can often benefit from the opportunities that emerge in the multi-cellular form as well (e.g. when compartmentalization of contrasting processes in separate cells allows them to stop producing the costly mediation metabolites). It is important to note that when single cells merge to form a two- or multicellular aggregate with permanently specialized cellular functions, no initial genetic specialization is required. Merging cells can be completely genetically identical, yet the symmetry of the initially unspecialized aggregated cells is broken spontaneously by regulatory mechanisms that function independently in each cell.

Mathematically, the evolutionary advantage of the division of labour in aggregate forms can be viewed as the emergence of new, higher fitness maxima when the dimensionality of phenotype space is increased. The new fitness maxima are not a direct consequence of aggregation, but are based on the interaction between aggregated individuals that engage in the division of labour. An increase in the dimensionality of phenotype space occurs when two or more cells couple their metabolism by exchanging metabolites. Because of this exchange, the fitness of an individual cell depends not only on its own metabolic state, but also on the metabolic states of its aggregation partners, and a maximum in low-dimensional space describing the physiology of a single cell may often become a saddle point (i.e. a point that is a fitness maximum in one direction in phenotype space, and a fitness minimum in another direction) when new dimensions (i.e. new cells) are added. However, the physiological state of each cell, such as the activity level of different processes, is regulated only by the cell itself based on the external and internal cues. Thus, the new fitness maxima have to be achievable by independent regulatory adjustments of each cell.

In this paper, we develop a model that serves as a quantitative description of the qualitative discussion mentioned above. For the case of two incompatible processes, we will show that a sufficient condition for reaching a higher fitness state of cellular specialization in aggregates is the existence of unicellular regulatory mechanisms that suppress one process when the other is active, and *vice versa*. More specifically, we derive conditions in terms of the fitness function that favour the existence of a saddle point, and hence the evolution of division of labour, in two-cell aggregates. We also show that the fact that higher fitness maxima can be attained in aggregate forms in turn selects for adhesion mechanisms that allow cells to form aggregates in the first place. We illustrate the general arguments by choosing particular forms of fitness costs and benefits and consider a simplified scenario of exchange of metabolites, and evolution of adhesion. Thus, the results show that the emergence of multicellularity and cell differentiation can, in theory, result from the evolution of the propensity of cells to aggregate driven by the fitness advantage of division of labour in the aggregate forms. This advantage is owing to incompatible physiological processes being performed separately in dedicated specialized cells, such that the results of these processes are shared among the cells in the aggregate. The division of labour occurs spontaneously at the regulatory level owing to mechanisms already present in unicellular ancestors and does not require any genetic predisposition for a particular role in the aggregate.

## 2. Model definition

We envisage the simplest possible scenario of aggregation—the formation of a union of two cells—and we consider a population of cells that reproduce and die, and between birth and death can exist in unicellular or two-cell forms. The population densities of single cells and two-cell aggregates are denoted by *n*_{1} and *n*_{2}. We also assume that the transition between unicellular and two-cellular forms is reversible and may occur a number of times during a cell's lifespan. The binding constant, which determines the fraction of time a cell spends in the two-cell form, is controlled by a heritable (genetic) parameter, 0 ≤ *σ* ≤ 1, which we call the cell stickiness. The total rate of aggregation *A*_{i,j} between cells of stickiness types *σ*_{i} and *σ*_{j} is then given by
2.1where *k*_{+} is an aggregation constant, which we assume to be identical for all cells, and *n*_{1}(*σ*_{l}) is the population density of single cells with stickiness *σ*_{l}. Also, for simplicity, we assume that rate of dissociation of an aggregate is independent of the stickiness of its (two) constituents. The *per capita* dissociation rate is denoted as *k*_{−}. Together, these assumptions make the fraction of time a cell spends in the aggregate state an increasing function of its stickiness. If a cell in an aggregate dies, then the remaining cell becomes a single cell. If a cell in an aggregate divides, then the daughter cell is released as a free cell while the aggregate remains intact.

To measure ‘fitness’, we assume that cells can produce two metabolites *x* and *y*, and the production of these metabolites confers a benefit *B*(*x, y*) and a cost *C*(*x, y*). The rate of reproduction of a cell that has metabolic rates (*x, y*)—its fitness—is then determined by the functions *B* and *C*. For the single-cell state, a simple form of the benefit function reflects the requirement that both metabolites are essential for the normal functioning of the cell:
2.2

Any realistic cost function *C* should satisfy the following constraints. First, because metabolic rates cannot increase indefinitely, the fitness must have a maximum or maxima at some intermediate metabolic rates and rapidly decrease for high metabolic rates. Hence, the cost function *C* should grow faster than *B* in any direction in the *x*–*y* plane. Second, to incorporate our assumption that the production of *x* and *y* are poorly compatible cellular processes, the cost function *C* should exhibit an ‘inefficiency penalty’ for producing both metabolites *x* and *y* in the same cell. To incorporate these assumptions, we consider cost functions in the form of
2.3

Essentially, this form means that if only one metabolite is produced (for example, *y* = 0), then the cost grows fairly slowly (algebraically), ∼*x*^{3}, but when both metabolites are produced, *x* = *y*, then the cost grows much faster, ∼*x*^{3}exp(*x*^{2}). There are many forms of cost functions (e.g. algebraic ones, *C*(*x,y*) = *c*_{x}*x*^{3}(1 + *y*^{4}) + *c*_{y}*y*^{3}(1 + *x*^{4})) that satisfy these two constraints and would lead to very similar results as presented below.

We note that this equation captures aspects of an empirically measured cost function [27], namely that the metabolic costs associated with expressing a trait increase faster than linear with the expression level. The ‘inefficiency penalty’ does not have a direct empirical basis, but represents the general principle of poorly compatible cellular processes. The importance of this will be explained in more detail below.

Our choice of *B* and *C* leaves only two free parameters (*c*_{x} and *c*_{y}), and to facilitate the visualization of the fitness landscapes (see below), we assume symmetry (i.e. *c*_{x} = *c*_{y} ≡ *c*). This assumption is not crucial for the results reported.

Metabolites produced by each cell are consumed by the cell itself when it is in the unicellular form, or are assumed to be equally shared for consumption within a two-cell form. Hence, the two-cell benefit function takes the following form:
2.4At the same time, each cell bears the costs of everything it produces either in a single- or multi-cell state. Overall fitness (i.e. the cellular rate of reproduction) is given by the difference between the benefits and the costs of metabolism. For a single cell with metabolic rates (*x, y*), fitness is therefore
2.5On the other hand, the individual fitness of a cell *i* (*i* = 1,2) in a two-cell aggregate with metabolic rates {*x*_{1}, *x*_{2}, *y*_{1}, *y*_{2}} is
2.6

Note that *R*_{1} (*x, y*) = *R*_{2,i}(*x, y, x, y*); that is, the fitness of a unicellular form is equal to the fitness of each cell in the two-cell form when both cells are producing the same amount of metabolites *x* and *y*. (When analysing the model dynamics below, we assume a cost of stickiness. Cells in two-cell aggregates tend to have higher stickiness, and thus pay an additional cost.)

Owing to the symmetry of *R*_{1}, and because of the cost of producing high levels of metabolites, the fitness of a single cell will be maximized at some intermediate level of metabolite production (*x, y*) = (*x**, *x**). Under certain conditions, this single-cell maximum becomes a saddle point for the fitness function of cells in a two-cell aggregate. These conditions can be most easily seen by assuming complete symmetry between the two cells of a two-cell aggregate, i.e. by setting *x*_{1} = *y*_{2} and *x*_{2} = *y*_{1} in equation (2.6). The fitness function of a cell in a two-cell aggregate, which *a priori* is a function of four variables, then becomes
2.7where *i* = 1,2 (i.e. a function of two variables). If the function *F* is restricted to the diagonal *x* = *y*, then the single-cell maximum (*x**, *x**) can be recovered as a maximum along this diagonal, and the question is under what conditions this point becomes a saddle point (i.e. a minimum along the anti-diagonal).

In appendix A, it is shown that the main criterion for (*x**, *x**) to be a saddle point is
2.8

This essentially means that the production of the two metabolites should be anti-synergistic in the vicinity of (*x**, *x**), so that the gain from producing more of both metabolites is less than linear. For example, this can occur with the inefficiency penalty assumed for the cost function (2.3). If this anti-synergy is strong enough, then the single-cell optimum (*x**, *x**) becomes a saddle point for the fitness function of cells in two-cell aggregates. Indeed, by construction of (2.5) and (2.6), there exists a range of parameter *c* for which the maximum (in {*x*_{1}, *x*_{2}, *y*_{1}, *y*_{2}} space) of the reproduction rate for a cell that is a part of a two-cellular complex is higher than that of a single cell. This is illustrated in figure 1 for *c* = 1/25. Figure 1*a* depicts the single-cell state fitness *R*_{1}(*x, y*), which has a maximum at the diagonal *x* = *y* = *x** with *R*_{1}^{max} ≈ 0.866. Figure 1*b* shows the fitness of a cell in a two-cell aggregate. With the assumption that we made above on the *x*–*y* symmetry of the cost function, the maxima of the two-cell-state fitness are in the subspace when two cells are exactly in anti-symmetric metabolic states, *x*_{1} = *y*_{2} ≡ *x* and *x*_{2} = *y*_{1} ≡ *y*; hence they are maxima of the fitness function, equation (2.7) (here *F*^{max} = 625/432 ≈ 1.45). One can compare figure 1*a*,*b* by noting that the diagonal *x* = *y* in both of them corresponds to the same function *F*(*x, x*). But while for the single-cell state the two-variable fitness *R*_{1}(*x, y*) has a global maximum along the diagonal *x* = *y* = *x** (figure 1*a*), the two-cell state fitness *F*(*x, y*) has a saddle point at *x* = *y* = *x** on the diagonal (figure 1*b*). For larger *c* this condition is no longer satisfied, so the two-cell state has the same diagonal fitness maximum as the single-cell form, and division of labour would not be expected.

It should be noted that equation (2.8) is a local condition near the single-cell maximum (*x**, *x**), and that the cost function (2.3) is of course only one of many such functions that lead to a fitness function that can satisfy equation (2.8). Thus, the cost (and benefit) function used here merely serves to illustrate a general principle.

To complete the basic model description, we assume that cells, whether in one- or two-cell form, reproduce individually by periodically releasing unicellular offspring at a rate that is proportional to the fitness of a cell. The only cell property that is inherited during reproduction is the stickiness *σ*. A small random variation in *σ* occurring during reproduction corresponds to mutations in this genetically determined trait. Finally, we assume a logistic form of the per cell death rate *Dp*, which is independent of whether a cell is single or a part of an aggregate:
2.9where *δ* is a parameter and *N* is the total population.

## 3. Model dynamics

There are three biologically distinct timescales in our model: the fast regulatory metabolic adjustment, the intermediate rate of cellular aggregation and dissociation, and slow reproduction and concomitant evolution of the heritable trait *σ*. This natural timescale separation allows us to simplify the mathematical analysis, assuming steady states of the faster processes in the dynamics of the slower events.

### (a) Metabolic regulation

So far we have specified benefits and costs of producing two metabolites at rates *x* and *y*, but we have not specified what controls the dynamics of metabolic regulation of *x* and *y*. The basic assumption is that each cell adjusts the rates of production of the metabolites to maximize fitness for given conditions (unicellular or aggregate form) via a fast regulatory mechanism. The important part of this assumption is that each cell acts individually to maximize its fitness, without ‘coordinating’ the metabolic regulation with its partner, yet the conditions in which a cell operates do depend on whether or not the cell is in aggregate form and, if it is, on the metabolic state of the partner. We first assume that in the single-cell state, each cell has a naturally occurring mechanism for regulating its metabolite production to the optimum of the fitness landscape given by *B*(*x, y*)−*C*(*x, y*), with *B* and *C* the cost functions, equations (2.2) and (2.3). The fitness landscape in metabolic space is defined by equation (2.5) and illustrated in figure 1 has a fairly simple form, so it seems reasonable to assume the regulatory convergence to the unicellular fitness maximum.

For example, one can assume that the cells adjust the metabolic states following the gradient in the fitness landscape with some random noise *η*:
3.1Here, *z*_{i} is one of the two metabolic coordinates {*x, y*}, and *η*_{zi}(*t*) is a random noise term with zero mean that is assumed uncorrelated in time and between different coordinates. A process given by equation (3.1) converges to a steady-state distribution of the cell population in ‘metabolic space’ *n*(*z*) [28],
3.2where *z* = (*x,y*) are again the metabolic coordinates in the unicellular state. We assume that the dynamics is fast (large *α*) and the noise is weak (small *Γ*), so that the population quickly becomes concentrated in the vicinity of the metabolic state conferring maximum fitness.

We assume that the same regulatory mechanisms that lead to metabolic fitness maximization in the unicellular state regulate the metabolic rates of two cells in aggregate form to a metabolic fitness maximum for the two-cell aggregate (essentially, the noise term in the metabolic dynamics of equation (3.1) leads to symmetry breaking, enabling two cells that just aggregated to diverge in their metabolic phenotypes). In fact, the dynamics given by equation (3.1), applied to *z*_{i} being one of the four coordinates {*x*_{1}, *y*_{1}, *x*_{2}, *y*_{2}}, also lead to a concentration of the metabolic rates (*x*_{1}, *y*_{1}, *x*_{2}, *y*_{2}) of two aggregated cells in the vicinity of a fitness maximum. The corresponding fitness landscape is illustrated in figure 1*b*. Figure 1 illustrates that the fitness maximum of a cell in a two-cell aggregate can be higher than the metabolic fitness maximum attained by a single cell. Essentially, this is because the fitness maximum of a single cell, which lies on the diagonal *x* = *y*, becomes a saddle point in the higher-dimensional metabolic fitness landscape of the aggregate form.

In the following, we therefore assume that each cell is at a metabolic state that maximizes its reproductive rate. This allows us to drop the metabolic coordinates from the notation for the cell concentration. For example, for *c*_{x} = *c*_{y} = 1/25, the (steady state) reproduction rate of a single cell becomes *R*_{1} ≈ 0.866 (maximum on the diagonal of figure 1*a*) and the reproduction rate of a cell in a two-cell state *R*_{2} = 625/432 ≈ 1.45 (corresponding to the maxima in figure 1*b*)

### (b) Transition from unicellular to multicellular states

The kinetics of the association of two cells with stickiness *σ*_{1} and *σ*_{2} into a two-cell complex with concentration *n*_{c}(*σ*_{1}, *σ*_{2}) is described by the rate equation
3.3Here, the gain or association term has the form introduced in equation (2.1), the first loss term describes the dissociation of a two-cell complex into individual cells and the second loss term describes the loss of a two-cell complex owing to a death of one of its constituents as defined in equation (2.9). The factor 2 reflects the fact that the death of either of two cells is sufficient for the elimination of a two-cell aggregate. Assuming sufficiently fast association and dissociation compared with the timescale of the evolution of stickiness, we calculate the steady-state concentrations of two-cell complexes for a given density of single cells:
3.4

To determine the steady-state concentration *n*_{2}(*σ*) of individual cells in aggregates, the concentration of complexes *n*_{c}^{*}(*σ*_{1}, *σ*_{2}) has to be integrated over one of its coordinates:
3.5The factor 2 reflects the fact that either of two cells in an aggregate can have the stickiness *σ*. Note that *χ* needs to be determined self-consistently because the quantity *N* in the denominator depends on the total number of cells, *N* = *N*_{1} + *N*_{2}, *N*_{j} = ∫*n*_{j}(*σ*) d*σ*.

A rate equation that is analogous to equation (3.3) holds for the evolution of the single-cell population:
3.6Here, the first loss term denotes the association of a single cell *σ* into an aggregate with any other single cell (with the factor 2 describing that two single cells are lost), the first gain term describes the dissociation of an aggregate, the second gain term describes the appearance of a single cell when a cell in an aggregate dies and the second loss term corresponds to the loss of a single cell owing to its death. The gain term in the third line accounts for the appearance of a new single cell owing to cell division that occurs within aggregated and free cells with rates (*R*_{2} and *R*_{1}) defined by equations (2.5) and (2.6). The term *M*(*σ*), a decreasing function of *σ*, accounts for the fitness costs of maintaining stickiness. This term imposes a penalty for two-cell aggregates; aggregation can thus only evolve if it provides a non-negligible advantage to the aggregating cells.

### (c) Evolution of stickiness

Substituting equation (3.5) into equation (3.6), we arrive at the equation for the evolution of the population density of cells with stickiness *σ*,
3.7The diffusion term, proportional to a small constant *D* and added to the birth term, describes mutational variation in stickiness at birth. Equation (3.7) can be solved numerically. For suitable parameter combinations, the population evolves towards higher stickiness, resulting in cells spending most of their life in two-cell aggregates. This process is illustrated in figure 2, and is due to the higher birth rates and resulting evolutionary advantage of cells that are more sticky, and thus spend more time in a state that is metabolically superior owing to division of labour.

## 4. Discussion

The evolution of cell differentiation in multicellular aggregates is an important transition in the history of life on Earth. Most existing models of this transition assume some pre-existing differentiation in the single cells and/or the pre-existence of some form of compartmentalization (i.e. multi-cellularity) [7–9]. The main result of the present study is a proof of the principle that multicellularity and cell specialization can emerge in genetically and physiologically homogeneous populations via spontaneous breaking of cellular universality (or symmetry) by regulatory non-hereditary metabolic mechanisms.

Such symmetry-breaking occurs if the division of labour between cells brings certain fitness advantages, and if regulatory mechanisms that allow cells to optimize their physiology exist in the ancestral unicellular form. It is important to note that with such mechanisms, the cells adjust their regulatory state individually, in a ‘selfish’ way; thus, no assumption about special cooperative interactions between the cells in an aggregate are necessary. Essentially, a suitable fitness landscape, exhibiting higher fitness for cells with differentiated functions in the aggregate form, determines the path of regulatory optimization towards cellular specialization. The prerequisite for this is that the four-dimensional fitness landscape of a two-cell aggregate has higher peaks than the two-dimensional fitness landscape of a single cell, reflecting the advantages of division of labour. In other words, the maximum of the two-dimensional landscape turns into a saddle point in the four-dimensional landscape. This qualitatively defines the general properties of the fitness function that promote (or inhibit) the transition to multi-cellularity, and can be formalized by considering second derivatives of fitness functions (see appendix A), which reveals that the main criterion for the maximum of the two-dimensional landscape to turn into a saddle point in the four-dimensional landscape is that there should be anti-synergistic interaction between the two metabolites, so that an increase in the production of both metabolites in a single cell has a sufficiently high inefficiency penalty. The cost functions used in this paper provide one example of such an inefficiency penalty.

As a consequence of metabolic inefficiency, an effective increase in the dimensionality of the physiological pathways owing to aggregation enables the cells to attain higher fitness in the aggregate form by dividing the labour of producing the two metabolites. If stickiness (i.e. the tendency to form aggregates) is a trait under selection, and if the costs of stickiness are not too large, then the increase in dimensionality of the fitness landscape and the concomitant increase in physiological fitness leads to the evolution of more sticky cells, resulting in the emergence of multicellularity and cellular specialization within the aggregates. We note that this mechanism for the evolution of aggregation of single cells into multicellular clusters is different from the classic hypothesis that such aggregation is driven by some form of selection for size [10,11] (e.g. owing to predation [12] or the need for cooperation [13]).

In virtually all models for cell differentiation, the basic underlying mechanism is a trade-off between different physiological functions. It is then usually assumed that, for unspecified reasons, there exist different cell types that either already occupy different locations on the tradeoff curve [8,9], or have the genetic potential to do so [7]. Subsequently, the optimal composition of the different cell types in cell aggregates is studied. In contrast, in our model, all cells are in principle physiologically identical, and the differentiation only manifests itself through a purely physiological regulatory mechanism once cells occur in aggregates. In essence, the symmetry-breaking regulatory mechanism generates a permanent spatial differentiation in cell aggregates. Such a physiological crystallization of potential temporal differentiation of single cells has been envisioned as one of the main routes to multicellular differentiation [29], and our model can serve as a basic metaphor for this process.

In most accounts, the basic physiological trade-off underlying the transition to differentiated multicellularity is between reproduction and viability, and hence between soma and germ cells [6,7,18,30]. The model presented here can also be viewed in that context. Then, the physiological variables *x* and *y* become traits describing reproductive productivity (number of offspring) and viability (probability to survive to reproduction). The latter depends on many factors, such as the ability of cells to move, which in some types of organisms is incompatible with mitosis [17]. In a germ–soma specialization scenario, the regulatory mechanisms relevant for symmetry-breaking may be based on a response to signals to stop growth that are emitted by fully developed, bigger germ cells [18]. Such signals could arrest the development of pro-soma cells, rendering them sterile. The predisposition for initial size and subsequent germ–soma differentiation would then stem from spontaneous asymmetric cell division (rather than from genetic differences).

To critically evaluate the plausibility of our model for the evolution of multicellularity, it will be essential to test the main assumptions and predictions experimentally. The most critical assumptions of the model are that (i) some important cellular processes cannot be performed well in the same cell, (ii) cells can readily evolve increased levels of attachment, and (iii) attached cells can complement each other metabolically, and thus specialize on one of two poorly compatible processes. The first assumption, about trade-offs between cellular processes, is fundamental to most models of metabolic specialization [31]. A number of recent studies have established concrete molecular mechanisms that can lead to trade-offs [32–36], and it will be interesting to test whether such trade-offs are pervasive between different types of metabolic processes and in many different organisms. If they are, this would increase the plausibility of the evolutionary transition towards multicellularly proposed here.

Testing the predictions of our model is challenging, but seems possible in principle. The main prediction is that conditions in which important cellular processes are incompatible with each other will promote the evolution of increased levels of attachment between complementary cell types. It is worthwhile considering whether this prediction can be tested with evolutionary experiments in the laboratory. As discussed above, oxygenic photosynthesis and nitrogen fixation are prime examples of incompatible processes [37], and unicellular cyanobacteria separate these processes temporarily, by performing photosynthesis during daytime and nitrogen fixation at night. One possible direct test of our model would be to evolve unicellular cyanobacteria in the laboratory under conditions where both processes are expected to be active (i.e. in continuous light in medium without fixed nitrogen) and ask whether the bacteria evolve adhesion and exchange of fixed compounds between cells that perform different processes. It is worth noting that the evolution of stickiness (i.e. of unicellular organisms forming multicellular clusters) has recently been observed in yeast [38], although the importance of an incompatible metabolic process for this phenomenon remains to be determined.

In conclusion, in this paper we present a model showing that multicellularity and cellular differentiation can develop when cells can form an aggregate that enables them to exchange chemical signals and metabolites. This aggregate essentially has a higher physiological dimension, so that when there are cellular processes that are incompatible in a single cell, segregation of these processes into separate cells is possible in the aggregate form. Regulatory mechanisms that can control such a division of labour within an aggregate can be expected in many ancestral unicellular forms and are based on signals coming either from the cell itself, or from partner cells in the aggregate environment. The resulting division of labour can generate fitness benefits that lead to selection on the propensity of cells to aggregate, and hence to form multicellular and differentiated organisms.

## Acknowledgements

M.D. acknowledges the support of NSERC (Canada) and of the Human Frontier Science Programme. I.I. acknowledges the support of FONDECYT (Chile). M.A. acknowledges the support of the Swiss National Science Foundation.

## Appendix A

As explained in the text, the symmetry between *x* and *y*, which follows from the form of the fitness functions (equation (2.6)), allows for a reduction of the dimension of metabolic space from four to two. Accordingly, we consider the symmetric fitness function *F*(*x, y*) = *R*_{2,i}(*x, y, y, x*), *i* = 1,2, where the *R*_{2,i} are the fitness functions (equation (2.6)) of a single cell in a two-cell aggregate. Then, the restriction of *F* to the diagonal *x* = *y* is
A 1and the restriction of *F* to the anti-diagonal through the point (*x**, *x**) is
A 2

Along the diagonal, (*x**, *x**) is a fitness maximum by assumption, hence
A 3

For (*x**, *x**) to be a saddle point of *F*, it must be a fitness minimum along the anti-diagonal, hence we must have
A 4

It is clear that this last inequality is satisfied if (∂^{2}*F*/∂*x*∂*y*)(*x**, *x**) is negative enough. We note that (A 4) also tends to be satisfied if the pure second derivatives of *F* are positive at (*x**, *x**), but this also tends to violate the condition (A 3) for (*x**, *x**) to be a maximum along the diagonal. If symmetry between *x* and *y* is not assumed, then similar considerations lead to analogous criteria in terms of second derivatives of fitness functions for a maximum in two-dimensional space to become a saddle point in four-dimensional space.

- Received September 23, 2011.
- Accepted November 18, 2011.

- This journal is © 2011 The Royal Society