## Abstract

Via strength in numbers, groups of cells can influence their environments in ways that individual cells cannot. Large-scale structural patterns and collective functions underpinning virulence, tumour growth and bacterial biofilm formation are emergent properties of coupled physical and biological processes within cell groups. Owing to the abundance of factors influencing cell group behaviour, deriving general principles about them is a daunting challenge. We argue that combining mechanistic theory with theoretical ecology and evolution provides a key strategy for clarifying how cell groups form, how they change in composition over time, and how they interact with their environments. Here, we review concepts that are critical for dissecting the complexity of cell collectives, including dimensionless parameter groups, individual-based modelling and evolutionary theory. We then use this hybrid modelling approach to provide an example analysis of the evolution of cooperative enzyme secretion in bacterial biofilms.

## 1. Introduction

Social interaction and collective behaviour are highly influential forces in biology. Living in groups allows individuals to evade predation, to forage more effectively and to exert a more powerful influence on their environments than individuals can when they act alone. Owing to their ubiquity and visibility, assemblies of metazoan organisms, such as insect swarms, fish schools, bird flocks and animal herds, have for many years drawn the attention of biologists, physicists and mathematicians [1–4]. Despite prescient early work [5], only recently have researchers broadly come to realize that most unicellular organisms are also social (figure 1). Bacteria, unicellular eukaryotes, and cancerous cells have group-level properties that are integral to how they live and, in the case of pathogens and cancer, how they cause disease [6–9].

A fundamental challenge for scientists studying cell collectives is to understand the emergence of group-level properties, such as spatial structure and behavioural coordination, from the interactions of individual cells with each other and with their surroundings [10]. Many biological processes—nutrient uptake, growth, motility and the secretion of extracellular compounds—interact with many physical processes—nutrient advection and diffusion, shear stress, physical shoving among cells and detachment—to yield cell group structure and collective behaviour. Additionally, cells alter their gene expression in response to each other and to the local microenvironment [11–14], and cell groups can evolve rapidly [7,15–19].

The numerous processes that contribute to cell group architecture and behaviour make it difficult to extract general principles about the origin of their emergent properties. Analytical methods for studying collective behaviour often focus either on heterogeneities in structure [4,20–22] or on heterogeneities in cell group composition [23–25], but rarely on both. Similarly, powerful theory has been developed for understanding the evolution of social interaction [26–30], but this theory is often difficult to apply directly to cell groups in realistic contexts (but see [31–33]). Computational individual-based modelling offers an alternative approach, implementing cells in two- or three-dimensional space that behave independently in response to their local microenvironments [34,35]. Such models allow subtle details of biology and physics to be considered and are excellent for studying cell group heterogeneity, but they are typically complex and sacrifice generality for realism.

In the present review, we argue that a combination of scaling analysis and individual-based modelling methods can be used to relate physical and biological mechanisms—and their associated, readily measurable parameters—to the more abstract principles of evolutionary theory. We briefly review these modelling approaches and highlight studies that have used them to analyse nutrient transport and consumption, selective sweeps and quorum sensing communication within cell groups. We then provide an example of this modelling approach that addresses cooperative enzyme secretion in bacterial groups. The analysis is described in detail to serve as a guide for others who may wish to use similar approaches. Our results provide a mapping between social evolution theory and the basic parameters of growth, enzyme secretion, solute diffusion and population structure.

## 2. Dimensionless numbers in physics and biology

Measurement units allow us to compare a quantity, such as length, with a standardized reference unit, such as the metre. In some circumstances, it is more useful to express lengths in terms of other quantities that constitute natural length scales, such as cell diameters. The dimensionless quantity Diameter_{cell-group}/Diameter_{cell} is thus an intuitive measure of cell group size. Such non-dimensional quantities have been widely employed in the physical sciences, as they reduce the complexity of a model under study and often allow the identification of critical ratios of a system's governing parameters [36,37].

In fluid mechanics and transport theory, many dimensionless numbers have been introduced [38]; some of these are relevant for biological processes at the cellular level and have been discussed in two recent and comprehensive reviews [39,40]. Two of the most widely used dimensionless numbers, which are critical for understanding cell group function and behaviour, are the Reynolds number and the Péclet number. The Reynolds number is defined as the ratio of inertial to viscous forces in fluid flow, and can be calculated as Re = *ρ*_{F}*UL*/*η*, where *U* is the characteristic fluid velocity scale of the system, *L* the characteristic length scale of the flow, and *ρ*_{F} and *η* the fluid density and dynamic viscosity, respectively. Very high values of Re thus correspond to turbulent flow, while small values of Re correspond to laminar flow. The Péclet number is the ratio of flow-mediated molecular transport and diffusion-mediated molecular transport; it can be calculated as Pe = *UL*/*D*, where *L* is a characteristic length scale of the system and *D* the molecular diffusion constant (e.g. of a particular nutrient). The Reynolds number is therefore a measure of the flow structure (turbulent versus laminar), while the Péclet number determines the dominant molecular transport mechanism (flow-mediated versus diffusive) for a given system.

The Reynolds and Péclet numbers often shift from values much less than unity for single cells, to values significantly above unity for cell groups. Therefore, viscous forces, laminar flow and diffusion often dominate the lives of solitary cells [41]. On the other hand, inertial forces, unsteady flows and flow-mediated solute transport become important for cell groups that have characteristic length scales much larger than those of single cells in isolation. In the following sections, we discuss other, less standard, dimensionless quantities and length scales that have been defined in various disciplines, including bioprocess engineering and evolutionary biology, to yield insights into the structure, function and evolution of cell groups.

## 3. The balance of growth and nutrient transport

The consumption of soluble nutrients by biofilm-dwelling bacteria depletes the nutrient concentration close to the biofilm surface. Fluid flow also slows near the biofilm surface because of hydrodynamic constraints. Together these two effects create a boundary layer that separates the nutrient concentration in the bulk advective fluid above the biofilm from the nutrient concentration inside the biofilm (figure 2*a*) [40]. By Fick's Law, the nutrient flux into the biofilm is determined by the concentration gradient across the boundary layer. Once nutrients have entered the biofilm, they are further transported by diffusion in addition to being consumed by biofilm-dwelling cells.

The relative strengths of nutrient transport and of nutrient consumption are critical for biofilm formation. Picioreanu *et al.* [43] first introduced this ratio as a dimensionless number:
where *l* is the biofilm thickness, *μ*_{max} the maximum bacterial growth rate, *ρ _{X}* the bacterial biomass density,

*S*

_{bulk}the concentration of growth substrate in the bulk fluid and

*D*

_{S}the diffusivity of growth substrate. The numerator describes the maximum bacterial biomass growth rate, and the denominator describes the maximum substrate transport rate through the biofilm. Picioreanu

*et al.*[43] used their dimensional analysis of substrate transport and bacterial growth in conjunction with cellular automata models that simulate bacterial cells growing in two- or three-dimensional space. The authors found that changes in

*G*are associated with dramatic changes in biofilm structure. Cell growth and division are uniform within biofilms when

*G*is small, which results in biofilms that grow rapidly with smooth fronts. If

*G*is large, however, cell growth is heterogeneous along the biofilm front, indicating that bacterial growth is limited by nutrient transport.

A modification of the *G* number was recently introduced to shift focus towards the proportion of cells in a biofilm that have sufficient access to substrate for growth, which is accomplished by considering the rate of molecular transport through the boundary layer, rather than the biofilm itself [42]. The modified dimensionless number, *δ*, is
where *μ*_{max}, *S*_{bulk}, *ρ _{X}* and

*D*

_{S}are defined as above,

*Y*is the yield with which bacteria convert substrate to biomass and

*h*the boundary layer thickness. The

*δ*number determines the depth to which substrate diffuses into a biofilm before being depleted, and therefore represents the thickness of the actively growing cell population, in units of

*h*(figure 2

*a*) [42].

When biofilms are limited by the maximum bacterial growth rate (low *G*, high *δ*), they expand into smooth-surfaced colonies as the majority of cells within them grow and divide, and different cell lineages remain spatially well-mixed (figure 2*b*). However, when biofilms are limited by nutrient transport (high *G*, low *δ*), the layer of actively growing cells is thin and restricted to the periphery of the cell group, generating instabilities along the advancing front. This effect amplifies surface irregularities into heterogeneous biofilm surface structures and leads to bottlenecks in the genetic composition within the layer of actively growing cells (figure 2*c,d*). As the biofilm grows, many cell lineages are cut off from access to nutrients due to chance alone, leading to an overall reduction in the number of cell lineages in the biofilm. The lineages that remain become spatially segregated as the cells within them grow and divide [44–46]. This effect has been demonstrated experimentally for bacteria [44,47,48], unicellular yeast [47] and social slime moulds [49], and it is conceptually linked to genetic drift of neutral variants and selective sweeps of strains that differ in their reproductive rates [47,48,50,51]. The *G* and *δ* numbers are also related to basic elements of social evolution theory, as spontaneous lineage segregation causes preferential interaction among cells of the same genotype [42,49]. We return to this point in our final section detailing a model of cooperative enzyme secretion in bacterial biofilms.

## 4. Lineage expansions and clonal interference

The pattern with which beneficial mutations spread through a population has been of central interest since the advent of population genetics in evolutionary biology. Classical theory primarily considers scenarios in which advantageous mutations are rare, and each one reaches fixation before any new beneficial mutations subsequently occur. However, experiments with bacteria have yielded measurements of beneficial mutation rates that are much higher than originally expected, which, given a sufficiently large population size, leads to competition among multiple favoured mutants in a single population. This phenomenon, known as clonal interference [52–54], strongly influences the rate of evolution within microbial communities and is generally expected to decrease the fixation rate of independent beneficial mutations.

When populations are constrained in space, beneficial mutations take considerably longer to reach fixation than they would in a well-mixed population of equivalent size [55–57]. As a result, clonal interference may be expected to occur more readily in systems with spatial constraints, as travelling waves of advantageous mutations collide with one another. The likelihood of interference depends on how often advantageous mutations occur and how long they take to sweep through a population, which in turn is a function of population size and the extent to which advantageous mutations provide a fitness benefit (i.e. the selection coefficient). If beneficial mutations are sufficiently rare and selection for them sufficiently strong, a regime of periodic selection is expected in which only one beneficial mutation sweeps through the population at a time (figure 3*a*). If advantageous mutations are relatively common and take longer to reach fixation, clonal interference will occur as advancing waves of selectively favoured mutants encounter one another and compete for access to space (figure 3*b*).

Noting the balance among multiple biological and spatial factors that contribute to clonal interference, Martens & Hallatschek [55] derived a characteristic length scale for the critical size of a cell group, above which clonal interference will occur
where *σ* is the speed with which the spreading wave of a beneficial mutant travels, *s*_{0} the mean selective advantage of beneficial mutations, *m* the rate of beneficial mutations per site and dim the dimension of the system under study (e.g. dim = 2 when cells are competing on a plane). Periodic selection is expected to occur for systems in which the characteristic length scale is (figure 3*a*), while clonal interference is predicted for systems in which (figure 3*b*).

The *L*_{c} length scale integrates physical and biological parameters to define a threshold for clonal interference. This analysis, along with many of those discussed in this article, highlights the importance of considering spatial structure for making inferences about evolution within cell groups. *L*_{c} is likely to be important for characterizing evolution within biofilm communities, which by definition are spatially constrained and transition from small initial colonies, in which periodic selection may be expected to prevail, to very large communities containing millions of cells, in which clonal interference is highly likely. The *L*_{c} length scale has also been explored in the context of cancer progression [58]; increased clonal interference is expected to delay the onset invasive cancer by slowing the accumulation of mutations that allow neoplasms to attain high growth rates contributing to malignancy [17,59].

## 5. Quorum sensing and length scales of communication

Beyond living in close proximity to one another, cells in groups often release diffusible molecules to which they and other members of the group respond. In the context of bacterial biofilms, this process is termed quorum sensing [13,60]. Cells are thought to use signal concentration as a proxy for population density, altering their gene expression profiles after a sufficiently high concentration is reached.

In a recently published unifying analysis, Pai & You [61] note that quorum sensing systems ultimately allow cells to assess when a critical enclosure volume has been reached. The critical volume occurs due to the accumulation of many cells in a given space, or the enclosure of fewer cells in a small space. The authors define a dimensionless ratio *υ* = *V*_{e,c}/*V*_{c} as the sensing potential of a detector, where *V*_{e,c} is the critical enclosure volume and *V*_{c} the volume of a cell. The *υ* ratio is the threshold enclosure space, expressed in the number of cell volumes, below which cells transition from solitary state to social state for a given regulatory target function. The sensing potential can be calculated from the basic parameters of a quorum sensing circuit, including the signal synthesis rate, the threshold signal concentration for activation of a detector, the signal-degradation rate constant and the signal-transport rate constant.

The sensing potential formulated by Pai & You [61] clarifies how changes in specific properties of quorum sensing circuits allow bacteria to monitor local conditions and adjust their expression of group-oriented behaviours accordingly. The specific prediction of their study is that quorum sensing bacteria will evolve sensing potentials that result in the activation of behaviours that require participation from multiple cells at a sufficiently high population density in order to be effective. Such behaviours include the secretion of digestive enzymes and nutrient chelators involved in pathogenicity, as well as polymeric substances that contribute to the structural stability of biofilms [62]. It is also important to note that not all quorum sensing-regulated phenotypes are secreted compounds, and the sensing potential may also evolve to tune the regulation of metabolic genes and other individual cell properties whose optimal expression may nonetheless depend on population density.

Pai & You [61] do not explicitly consider population spatial structure in their model framework; however, though their model assumes uniform interaction neighbourhoods, one may consider the quorum sensing process to operate within local patches belonging to a larger metapopulation, within which each patch is roughly uniform [63,64]. In a heterogeneous environment, quorum sensing allows bacteria to monitor conditions within their local patch and adjust their behaviour in response to enclosure volume. Future theoretical work that more directly addresses the combination of population spatial structure, quorum sensing regulation and evolutionary dynamics represents an exciting direction that will complement the growing number of experimental studies on quorum sensing evolution [65–69].

## 6. Spatial lineage mixing and the evolution of bacterial cooperation

The collective secretion of extracellular compounds lends cell groups the ability to consume complex growth substrates and to cause disease. Extracellular digestive enzymes and nutrient-sequestering molecules are common within bacterial biofilms, but they present a difficulty for evolutionary theory. Because such enzymes are secreted into the extracellular space, non-secreting cells that do not pay the cost of contributing to the public good may reap their benefits. And because they pay no cost of production, such cells can outcompete their enzyme-secreting counterparts [31,66,68,70–72].

A dominant factor allowing cooperation to evolve in many systems is the preferential interaction among cooperative individuals relative to their competitive neighbourhoods [30]. We might therefore expect the interaction between secreted enzyme transport and genetic lineage distribution to be critical for the evolution of cooperation within bacterial biofilms. In this section, we develop and analyse a general model for digestive enzyme secretion in biofilms and test it using a well-established individual-based simulation framework for biofilm growth. The analytical results derived below are similar to those of Driscoll & Pepper [73], who also study the evolution of diffusible public good secretion. Our approach differs in that we include more physiological detail by using parameters that can be measured in the laboratory; we implement cells as spheres in three-dimensional space; and we provide an explicit description of spatial clustering for multi-cell scenarios of competition between enzyme producers and non-producers. These modelling choices allow us to couple the analysis with computational simulations and to emphasize dimensional reduction. We aim to provide sufficient description to serve as a guide for other researchers who may wish to use similar approaches for their systems of interest.

We start by deriving the concentration profile of the digestive enzyme (*E*) around a single secreting cell that is stationary within a large body of still liquid. By Fick's Law, *E* obeys the diffusion equation in spherical coordinates,
6.1where *D*_{E} is the diffusivity of the secreted enzyme, *t* represents time, and *r* the radial distance from the centre of the secreting cell. Diffusion of small molecules is typically such that the concentration profile of *E* reaches steady state quickly relative to cell growth and division. The steady-state profile of *E* is obtained by integrating equation (6.1) for ∂*E*/*∂t* = 0
6.2

This is a second-order differential equation and can be solved using two boundary conditions. The first boundary condition we use is that the enzyme concentration should vanish far away from the producing cell 6.3

The second boundary condition implements conservation of mass and states that the rate of enzyme passing through the surface of the cell matches the rate of enzyme production by the cell (*q*_{E})
6.4where the left-hand side of the equation represents the integral of the diffusive flux out of the cell over its entire surface (*S*). Solving equation (6.2) with boundary conditions (6.3) and (6.4) yields the following solution:
6.5which states that the concentration of a secreted digestive enzyme increases directly with the rate of enzyme production and decreases with the inverse of the distance from the producing cell.

### (a) Conditions favouring public good secretion: two-cell scenario

We will now use the profile of the extracellular enzyme concentration, *E*, around a single cell to study competition between producing and non-producing cells (which we will term cheaters by convention) [74]. The rates of increase in mass per volume of a producing cell *P* and a cheating cell *C* are defined by
6.6and
6.7where *μ* is the growth rate per unit mass, called the specific growth rate. Equation (6.6) implements a metabolic cost of enzyme production (*c*), which is subtracted from the growth rate of producers and assumed to be an arbitrary function of the enzyme production rate (an explicit cost function will be defined for our simulations below). We assume that the specific growth rates are linear functions of the local concentration of the digestive enzyme:
6.8

Here, *μ*_{0} is the basal specific growth rate and *b* a coefficient of growth increase per mass of enzyme, which implements the benefit of the secreted public good. In reality, cells benefit from nutrients released into the environment by enzymes as they break down complex substrates into smaller, importable nutrients; however, we only model diffusion of the secreted enzyme. We make this simplification for the sake of clarity and tractability, but note that this approach approximates the full description of any system in which the nutrients liberated by the enzyme diffuse much faster than the enzyme itself [75]. For example, extracellular chitinases of *Vibrio* spp. [76] have an approximate molecular weight of 90 300, which can be converted to a molecular diffusion constant of *D*_{chitinase} = 58 μm^{2} s^{−1} [77]. The product of chitinase activity, *N*-acetylglucosamine (GlcNAc), has a diffusion constant that is an order of magnitude larger: *D*_{GlcNAc} = 500 μm^{2} s^{−1} [78]. We expect such a difference in the diffusion constants of extracellular enzymes and their digested products often to be upheld, because digestive enzymes are typically much larger than the nutrient molecules they release into the environment.

We can use our model of growth rates together with the enzyme concentration profile from equation (6.5) to determine the conditions for which the producing cell outgrows a cheater cell in its vicinity. The producer has the advantage when its fitness (*w*_{P}) is higher than that of the cheater (*w*_{C}). Fitness is simply the net specific growth rate of each cell
6.9and
6.10

The producer therefore has the advantage when *w*_{P} > *w*_{C}. Using equations (6.8)–(6.10), this condition for the fitness advantage of a producer can be expressed as
6.11where *E*_{P} and *E*_{C} are the concentrations of secreted enzyme experienced by the producer cell and the cheater cell, respectively. From equation (6.5), the values of *E*_{P} and *E*_{C} are
6.12and
6.13where *r*_{cell} is the radius of a producer cell and *d* the distance between the producer and the cheater cells. *q*_{E} was defined above as the rate of enzyme production per producer cell. We now replace *q*_{E} by a term for the enzyme production rate per biomass of producer (*k*_{E}), which is more convenient for the analysis that follows. The conversion is *q*_{E} = *M*_{P}*k*_{E}, where *M*_{P} is the mass of a single producer cell. The mass of the cell is the product of its average density (*ρ*) and its volume, and we can therefore replace *M*_{P} by After substituting equations (6.12) and (6.13) into equation (6.11) and dividing through by *r*_{cell}, we can rewrite the condition for producer advantage as
6.14

The first factor on the left-hand side of equation (6.14) is a dimensionless number, which we will call *B*_{L} (benefit localization)
6.15

*B*_{L} compares the fitness increase afforded by accumulation of secreted enzyme (numerator) to the diffusion of enzyme away from the producing cell (denominator). The expression that results from this substitution is
6.16

The ratio *c*/*μ*_{0} quantifies the cost of enzyme production, scaled to the basal cell growth rate. The expression (1−*r*_{cell}/*d*) is equal to zero when the producer and the cheater cells are directly adjacent, and approaches unity as the cheater cell is moved far away from the producer cell. Finally, the dimensionless number *B*_{L} captures to what extent the fitness benefit of secreted enzyme is localized around the producer cell. Small values of *B*_{L} correspond to rapid diffusion of enzyme away from the producer relative to its rate of production and thus a more homogeneous distribution of enzyme-mediated benefit in the environment. Large values of *B*_{L} correspond to steeper gradients of decreasing enzyme concentration around the producing cell and a resulting fitness benefit that is more tightly localized around the producer.

For a system containing one enzyme producer and one cheater, equation (6.16) describes whether the benefit of the secreted enzyme is sufficiently privatized by the producer for the secretion phenotype to be favoured [73,79]. The left-hand side describes the extent to which the enzyme-producing cell preferentially benefits itself due to localization of the secreted enzyme (*B*_{L}), and its distance from the cheater cell, captured by (1−*r*_{cell}/*d*). When the product of these factors outweighs the cost of enzyme production, the secretion phenotype is favoured.

For simplicity, we began with the two-cell scenario described above, which introduces the central importance of how enzyme distribution and cell–cell distance interact to control whether enzyme production is favoured. In the following section, we address the more general problem of social evolution within groups containing many cells.

### (b) Extension to a system of many cells

We will now extend the two-cell scenario to one with an arbitrary number of producer (*n*_{P}) and cheater (*n*_{C}) cells. The fitness values of the producer and the cheater cell types are defined by averaging the growth of the cells in the two subpopulations:
6.17and
6.18

We are again interested in the condition *w*_{P} > *w*_{C}, which for the multi-cell scenario is
6.19

We will assume that all producer cells are the same size (this will be relaxed in our simulations below). The important distinction from the two-cell scenario is that the concentration of *E* now experienced by a focal cell, *α*, is the sum of the contributions from all producers in the system:
6.20

Here, *d _{αγ}* is the distance between the focal cell

*α*and producer cell

*γ*. If the focal cell is itself a producer, then we assume

*d*=

_{αγ}*r*

_{cell}for

*γ*=

*α*. We can now determine the form for the sums in inequality (6.19): 6.21and 6.22

If we substitute these expressions into inequality (6.19) and re-scale the distance between two cells by the cell radius (such that ), the multi-cell version of inequality (6.16) becomes 6.23

Equation (6.23) is closely analogous to Hamilton's rule, BR > C, the canonical condition of inclusive fitness theory under which cooperation is selectively favoured [80]. Here, B is the fitness benefit of cooperative behaviour, C the cost of cooperative behaviour and R relatedness, the regression coefficient of recipient genotype on donor genotype across all cooperative interactions. Relatedness is often interpreted to signify common descent, but more generally the relatedness coefficient is a statistical description of the extent to which cooperative actor genotype predicts recipient genotype [33,81–86]. For social traits that influence neighbours in a distance-dependent manner, including the secretion of diffusible public goods, relatedness corresponds tightly to the spatial clustering of cooperative individuals with each other, relative to the clustering of cooperative individuals with cheaters [33,73,87]. In such scenarios, spatial segregation of different genotypes yields high relatedness coefficients, whereas even mixture of different genotypes yields relatedness coefficients near zero (assuming no discrimination mechanisms that allow cooperative individuals to preferentially target one another to receive benefits).

Equation (6.23) thus contains the same fundamental components as Hamilton's rule, expressed in terms of the parameters of this particular system. The left-hand term in parentheses,
6.24represents the degree of clustering among producer cells (left-side compound summation), minus the degree of clustering between producer and cheater cells (right-side compound summation). Together with *B*_{L}, this clustering differential captures the extent to which producer cells preferentially benefit their own kind via the secretion of the digestive enzyme. That is, the combined effects of the clustering differential and *B*_{L} determine the relatedness coefficient and total cooperative benefit associated with extracellular enzyme secretion for a given population structure and set of parameter values describing growth, enzyme production and enzyme transport. When the collective benefit provided by enzyme secretion is sufficiently biased towards enzyme-producing cells, such that the cost of enzyme production is offset, cooperation is selectively favoured. Importantly, equation (6.23) describes the instantaneous dynamics of a cell group; as a population grows and its structure changes, the balance of equation (6.23) may change as well.

### (c) Simulations with an agent-based model

The derivations above illustrate the basic links between the abstract evolutionary theory of cooperation and the core parameters of cell group growth and public good production. They also imply that the outcome of competition between producers and cheaters may be predicted if the population spatial structure, the *B*_{L} number and the cost *c*/*μ*_{0} to producers are known. The *B*_{L} number provides the additional important insight that it is not the values of the parameters *b*, *k*_{E}, *r*_{cell}, *D*_{E} and *ρ* in isolation but rather their compounded value according to equation (6.15) that is critical for the evolution of diffusible public good production in spatially structured environments.

We now test the predictive role of *B*_{L} with simulations of biofilms in two-dimensional space. The computational framework used to run these simulations has previously been described in detail and tested experimentally [34,46,88]. Our model relaxes some of the assumptions of the analytical derivations above by adding realistic detail to the cells and their interactions with each other. The transport of solutes is still assumed to occur by diffusion, but diffusion now occurs only within a boundary layer that extends a distance *h* above the biofilm. Cells are allowed to vary in size across the population as they grow and divide. We carried out simulations in which the producer and the cheater cells were inoculated at an initial 1 : 1 ratio (*n*_{P} = *n*_{C}), allowed the simulations to run until the biofilm reached a pre-defined maximum thickness, and then quantified the outcome of competition by computing the ratio of producer fitness to cheater fitness (*w*_{P}/*w*_{C}).

We first assumed that cell growth rate is exclusively a function of public good concentration at the cell's location, and we conducted an array of simulations in which *B*_{L} was altered by independently varying the values of the parameters that compose it (figure 4). As expected, our simulations showed that there is a threshold value of *B*_{L} above which enzyme-secreting cells are selectively favoured. Furthermore, the effect of varying *B*_{L} was identical regardless of which of its constituent parameters (*b*, *k*_{E}, *D*_{E} or *r*_{cell}) was altered.

We next extended the preliminary analysis by allowing growth rate to vary as a function both of local secreted enzyme concentration (*E*, as above), and of available nutrient (*N*, assumed to diffuse into the biofilm from a bulk liquid). The scenario we implement is one in which bacteria can achieve a basal growth rate by consuming a readily accessible carbon source, the basic nutrient *N*, which diffuses into the biofilm from the bulk liquid. Bacterial growth may be augmented by the activity of the secreted enzyme, which liberates a different growth substrate [89]. The two nutrient sources are implemented separately to allow us to independently vary the effects of two distinct phenomena on the evolution of cooperation. The first is the extent of lineage segregation within growing biofilms, which is determined by the thickness of the actively growing layer along the advancing front. The active layer thickness is governed by a parameter group (*δ*, see §3 and figure 2) that includes bulk nutrient concentration. The second is the concentration profile of secreted enzyme, determined by *B*_{L}. As noted above, the benefit of a secreted enzyme becomes more privatized by cells producing it as the *B*_{L} number increases. A stoichiometric table detailing the exact growth dynamics of producer and cheater cells is provided in the electronic supplementary material, table S1.

For *δ* > 10^{3}, which results in well-mixed biofilms, cooperative cells have higher fitness when *B*_{L} exceeds a critical threshold of approximately 10^{−2} (figure 5*a*). The threshold *B*_{L} above which cooperators are favoured corresponds to the length scale on which cell lineages cluster due solely to their immobility and limited dispersal (population viscosity). Note that diffusible enzyme production can still be favoured in relatively mixed environments, so long as the spatial range at which it provides a fitness benefit matches the spatial range along which cells tend to be of the same genotype.

For *δ* < 10^{3}, the threshold *B*_{L} at which producers are selectively favoured increases sharply. This result was somewhat counterintuitive, as decreasing *δ* leads to increasing spatial segregation among cell lineages, which in principle could allow for cooperative cells to be favoured even if the benefit of their secreted enzyme is distributed farther away from them (decreasing *B*_{L}). However, we see that the conditions favouring cooperation become more stringent as *δ* decreases because nutrient limitation creates a strong advantage for cell lineages that accumulate even marginally greater biovolume at the earliest time points during biofilm growth [42,62,90]. Such cells are able to deny their neighbours access to nutrients and in so doing dramatically reduce their ability to grow. Under these conditions, cheater cells outgrow cooperative cells if the secreted enzyme is not strongly localized (figure 5*b*). However, if the secreted enzyme's effect is sufficiently localized around producing cells (high *B*_{L}), then producers outcompete cheater cells early during biofilm growth and remain dominant over the course of the competition (figure 5*c*). Indeed for low *δ* and high *B*_{L}, cooperative cells outcompete cheater cells 10 times more strongly than they do with the same *B*_{L} value at high *δ*. Low *δ* leads to more globalized competition for nutrients and increased segregation among cell lineages (i.e. increased relatedness) [42,74], both of which increase the advantage of spatially localized cooperative behaviour [30].

Our simulation model makes a number of simplifying assumptions: inactive cells do not decay, biofilms are not subjected to shear stress, cells cannot disperse, biofilms are always initiated with a confluent monolayer of cells, and there is no plasticity in expression of the digestive enzyme. Relaxing some of these assumptions will certainly provide further insight into the evolution of enzyme production in biofilms. We expect that decreasing initial cell density will favour enzyme producers by allowing them to preferentially benefit themselves and their clonemates prior to experiencing competition with non-producers. Conditional enzyme secretion, for example in response to quorum sensing signals [62,91] or to nutrient conditions [92,93], may allow cooperative cells to avoid exploitation by non-producing cells early during biofilm formation, when growth deficits lead to a severe competitive disadvantage. The durability of secreted public good compounds [94], the shapes of their benefit and cost functions [95], and disturbance-dispersal dynamics [89] also play an important role in the evolution of cooperation, but for the sake of illustration and brevity we have omitted these molecular and ecological details from our study.

Our analytical model and simulations illustrate how scaling analysis can be applied to make predictions about the evolution of social behaviour in cell groups, and how one may relate the detailed parameters of cell growth, enzyme secretion and solute diffusion to the abstractions of evolutionary theory.

## 7. Conclusions

Small details of physics and physiology are often critical for understanding biological systems in general and cell groups in particular. To capture such details, realistic individual-based models of cell groups often contain dozens of parameters describing a variety of biological and physical processes. Deriving general principles about cell groups therefore presents a great challenge, one that we think can be effectively met by seeking and using dimensionless numbers that collate the parameters of a system into its major driving forces. This approach vastly reduces the size of parameter space to be swept when exploring a problem of interest; provides greater clarity in the interpretation of model results; and offers a straightforward route to experimental testing and consolidation with more abstract theories from collective behaviour and evolutionary biology.

## Acknowledgements

K.D. is supported by a Human Frontier Science Program post-doctoral fellowship; S.A.L. is supported by the Defense Advanced Research Projects Agency (DARPA) under grants HR0011-05-1-0057 and HR0011-09-1-055; B.L.B. is supported by the Howard Hughes Medical Institute, National Institutes of Health grant 5R01GM065859, and National Science Foundation grant MCB-0343821; and J.B.X. is supported by an NIH Director's New Innovator Award DP2OD008440, and the Integrative Cancer Biology Program 1-U54-CA148967-01. The raw biofilm simulation data generated for this paper are archived as text files in the Dryad database under the following doi:10.5061/dryad.5947m.

- Received November 21, 2012.
- Accepted January 3, 2013.

- © 2013 The Author(s) Published by the Royal Society. All rights reserved.