## Abstract

Sociality is primarily a coordination problem. However, the social (or communication) complexity hypothesis suggests that the kinds of information that can be acquired and processed may limit the size and/or complexity of social groups that a species can maintain. We use an agent-based model to test the hypothesis that the complexity of information processed influences the computational demands involved. We show that successive increases in the kinds of information processed allow organisms to break through the glass ceilings that otherwise limit the size of social groups: larger groups can only be achieved at the cost of more sophisticated kinds of information processing that are disadvantageous when optimal group size is small. These results simultaneously support both the social brain and the social complexity hypotheses.

## 1. Introduction

Social living can take one of two forms: animals can form loose aggregations (exemplified by insect swarms and antelope herds, which are typically based on short-term advantages and whose persistence depends on immediate costs and benefits) or they can form congregations (exemplified by the bonded social groups of primates and some other mammals, including cetaceans, elephants and equids among others) whose advantages derive from long-term association, and whose persistence mainly depends on the trade-off between short- and long-term benefits and costs [1]. It has been suggested that this second kind of sociality is a kind of large-scale coordination problem that depends on bonding processes underpinned by more sophisticated cognitive mechanisms (associated with larger brains [2]). Indeed, the social brain hypothesis [2–7] implies that the size of group that can coordinate its behaviour is limited by the cognitive capacity (essentially, the neural processing capacity) that organisms can bring to bear on the problem.

Although there is considerable neuroanatomical evidence (at the individual as well as the species level) to support the social brain hypothesis in both primates [8] and specifically humans [9–12], the role that cognition plays in this remains unspecified. The recent resurgence of interest in the relationship between communication complexity and social complexity (the social complexity hypothesis) [13,14] offers a possible mechanism by postulating that (i) the complexity of information processing limits social group size and (ii) these communication competences depend on computationally expensive cognitive capacities.

We here use an agent-based model to show that the cognitive demands of coordination impose an upper limit on the size of social group that a species can maintain, but that costly increases in information processing, while disadvantageous when optimal group size is small, can allow the evolution of much larger groups providing there is sufficient benefit from doing so. In our model, the objective function is the size of group that can achieve effective social coordination, and we use a central processing unit's time required to achieve this objective function as an estimate of the cognitive demands of different information-processing strategies. In this, we assume that the size of the brain affects the speed and volume, and hence complexity, of decisions that can be made.

We use a novel coordination task for this. A pure coordination task differs from the conventional public good tasks in that there is no optimal solution, the sole point being to converge on some common behaviour, with the payoff simply reflecting the extent to which the members of the group converge. Conventional public good games involve what amounts to a trading relationship, whereas a coordination task of the kind we use simply requires agents to converge on some common solution. In this respect, it mirrors the common human case in which individuals adopt a set of shared cultural values, where the cultural icon or marker may be arbitrary and the value of the icon itself secondary to the fact that individuals are bound together in a cultural community, which in turn allows them to solve collective action problems more effectively [15,16]. Coordination problems of this kind may have been especially important in the context of the evolution of human sociality [16]. One important consequence of this conception is that, since agents do not pay or withhold some of their capital, there is no payoff for free-riding.

While coordination tasks can, of course, have a direct functional outcome (e.g. agreeing direction of travel to some desirable resource [17]), many coordination tasks in humans simply involve agreeing on cultural markers (or ‘tags’) to identify group members so as to enable collective action in the future [15,16]. In many of these cases, the coordination task can be quite arbitrary (agreeing on a common cultural icon or belief about the world [18–20], or even a common dialect [21,22]). Cultural convergence of this kind is not a functional end in itself, but rather provides the psychological basis for subsequently achieving a functional end (in some cases—but not always—reinforced through costly signalling [23,24] or costly punishment [25]). Signing up to the same cultural marker may signal acceptance of a set of cultural or moral values that acts as a cue of trustworthiness and willingness to reciprocate.

An important feature of our model is that agents are not panmictic, but rather, as in the real world, are constrained in the number of individuals with whom they can interact by the structure of the social network in which they are embedded [26–28]. The social networks of these species are characterized by dyadic relationships that require expensive maintenance (for instance, in grooming time) leading to long-term edge stability. Although panmixia has often been assumed in models of the evolution of behaviour [29,30], natural human populations are invariably structured and network dynamics are radically different in structured populations [31–33]. Our models map agents onto a network, and so limit the range of agents with whom they can interact.

## 2. Material and methods

The basic structure of our approach focuses on a group of *n* agents that face a coordination problem which requires behavioural synchrony, such that each agent has to do her part at the right time and in the right way for the group to be able to act as one (for a general overview of the model, see [31]). We use synchronization on a dial to capture this problem. This is a simple but widely used [31,34] device that stands for almost any coordination problem, though it most obviously relates to how a group decides on direction of travel. (We stress that agreeing a direction of travel is only one of many possible coordination tasks, many of which—as in agreeing on an arbitrary cultural icon as a marker of group identity—are in themselves only indirectly functional. Agreeing the direction of travel has the advantage of being particularly easy to model.) Each agent is first assigned an initial value between 0° and 360°, and, for convenience, we refer to this as its *information value*. One of these vectors is defined as the ‘true information’ and is the property of just one agent, while all the other agents are assigned a randomly distributed value. The agents are arranged in a random *n*-node network in which each agent is linked to *k* others. The agents' task is to synchronize their information values, which they do by comparing their respective values and then finding a consensus. Synchronization takes place via a set of random dyadic meetings, in which the agents exchange and update their information values. (For detailed mathematical definitions and derivation, see appendices.) After each meeting (and exchange of information), each agent calculates a weighted average of three bits of information: her original information, the partner's information and the information the partner received in her previous meeting. These meetings are repeated until, on average, each agent has taken part in *τ* meetings. At this point, the average distance between the agents' individual information and the true information is
where *w* and *ω* are weight matrices for the weights the agent uses for the partners' and the third party information, respectively, while *ϕ _{T,i}* is the information held by agent

*i*at the end of the synchronization episode,

*ϕ*

_{TI}is the true information and

*T*is the total number of meetings in the group:

*T*=

*n*.

*τ*/2Agents are assumed to be trying to get as close to the true information as possible, though they are not necessarily aware of what this value actually is. To do this, each of them estimates an ‘optimal’ set of weights using a memory of past information exchanges and a simple least-squares optimization function (which is similar to the way humans optimize [35]). Note that requiring the agents to converge on a single value (the true information) is not in itself a defining feature of the model: it is simply a heuristic device to force coordination while at the same time minimizing computational demand. Allowing the agents to find their own equilibrium leads to convergence in just the same way [31], but it invariably takes longer. Our concern is with the constraints that different communication and cognitive strategies place on how fast convergence (synchronization) occurs, and the limits that this imposes on the size of social groups.

In our analysis, agents have three possible ways of estimating the optimal weights, which correspond to increasing levels of cognitive demand. Model *F* = 1 is the simplest: agents ignore both the third party information and the differences among their partners. In model *F* = 2, the agents ignore third party information, but recognize that there are differences (e.g. in reliability) among their partners. In model *F* = 3, the agents use both types of information. In calculating these weights, we varied the size of memory sample that the agents could use in their optimization. We used the processor time associated with each optimization act as an index of the cognitive demand of a strategy, and used this as a proxy for the amount of brain tissue needed in managing a strategy. This allowed us to introduce an implicit distance function
where is the measured processor time, *F* is the index of the method used by the agents, and and are the ‘optimal’ weight matrices as estimated by the agents.

We assume that there is some threshold of synchronization efficiency, above which the group is deemed unable to perform the communal action, and below which the group is in sufficient behavioural synchrony to be able to act as one. Using this threshold, we can define the largest group that can be in synchrony as follows: where λ is the synchrony threshold.

## 3. Results

As might be expected, our simulations show that the maximum group size increases as calculation capacity increases for all the three models (figure 1). More importantly, however, figure 1 shows that for both models *F* = 2 and *F* = 3 there is a *c**(*F*) such that *n**(*F* − 1,*c*) > *n**(*F*,*c*), for *c* < *c** and *n**(*F* − 1,*c*) < *n**(*F*,*c*), for *c* > *c**. In other words, the simplest model with the lowest computational demand allows groups to form, but these are constrained to relatively small sizes. There is some possibility of increasing group size by increasing computational capacity, but this option is capped. To achieve a significant further increase in group size, the agents must switch to a more complex information-processing strategy (model *F* = 2) that allows them to differentiate among their partners. Note that while this method is uneconomical for small groups, it yields an increase in group size if the calculation capacity is large enough. Just as in the simple model, a further increase in calculation capacity allows a further increase in group size, but it too hits an upper limit (albeit at a higher level). To move beyond this limit, the agents not only need larger computational capacities, but also have to add an additional information stream (model *F* = 3). This is the least economical method for small or middle-sized groups, but, by permitting third party information to be exchanged, it allows the group to cut through the glass ceiling imposed by the other two models and significantly raises the limit on group size.

## 4. Discussion

It is important to note that the more complex strategies are highly disadvantageous when group size is small: indeed, the more complex the strategy, the more disadvantageous it is. Thus, the evolution of communicative and cognitive complexity is explicitly dependent on an ecological demand for large social groups: it is only when there is a need for large groups that the selection pressure will be sufficient to warrant the costs involved. This is driven by the demands of social coordination. If there is no requirement for coordination (in terms of the present model, optimal group size *n**≈ 1 because organisms do not need to cooperate), then there is insufficient selection pressure to motivate either complex communication or investment in the large brains required to support the requisite cognitive abilities.

This suggests that complex communicative abilities, large social groups that involve social coordination and large brains will all be equally rare, as indeed seems to be the case [13]. Although we have used a very simple (and computationally easy to implement) objective function in the model (achieving synchrony in compass direction), it is important to be clear that our model is not limited to this particular context. Rather, our device stands as an abstraction for any behaviour that requires synchrony or coordination in order to maximize biological fitness, whether the fitness payoff is a direct or an indirect consequence of coordination along stable, expansive dyadic network edges. Direct fitness payoffs may arise from coordinated foraging or hunting (as in some social carnivores [36]), cooperative defence against predators or rival conspecifics (e.g. group territorial defence; as in many primates) or any other ecologically relevant behaviour that requires group members to synchronize or coordinate their behaviour in some way. However, in species characterized by multi-level social systems (e.g. elephants, some cetaceans, most primates [37] and humans [27,28]) where the higher level of organization functions to solve a collective action problem, the payoff may be indirect and mechanisms are needed to facilitate cooperation at group level. In these cases, solving a collective action problem is a two-step process: willingness to collaborate is established before the need to collaborate [38], and is often (but not necessarily always) signalled by some marker (or ‘tag’) of group membership.

Our model not only shows how cognition (processing power) could limit group size, but also sheds light on why some other species (for instance, herding mammals with panmictic group structure) use entirely different—and much simpler—coordination strategies. This, of course, does not preclude the possibility that bonded species such as humans might use simple heuristics to solve coordination tasks when the task demands are simple [39,40]. However, our analysis does raise the question as to why some species choose simple coordination methods associated with panmictic structure, while others resort to using more complex and costly cognition.

In sum, our analysis provides a formal mechanism for both the social brain hypothesis [3] and the social (or communicative) complexity hypothesis [13] by demonstrating that greater information-processing demands are reflected in greater cognitive (computational) costs, but that bringing these on stream can allow organisms to break through glass ceilings to significantly increase social group size. If size matters (large groups offer greater protection against predators [41], are more efficient for foraging [42] or win more territorial fights [43,44]), then there will be selection pressure to pay these costs. But the significant finding is that these costs are considerable and, when optimal group size is small, make the costs prohibitive. In these circumstances, simpler cognitive strategies are more profitable. In the limiting case, when there is no requirement for bonded relationships to ensure group stability through time, it is not worth paying the costs of complex cognition. This suggests that these kinds of more complex sociality will be relatively rare. Broadly speaking, this is what we see in the natural world [2].

## Funding statement

This research is supported by a European Research Council Advanced grant to R.I.M.D.

## Appendix A. The formal model

Let us define a group as a set of *n* agents that are interlinked in a connected network with each agent having a degree of *k*. Let *G*(*n*) denote the set of all possible such networks:
where *e _{i}* denotes the set of agents that agent

*i*is connected to, and hence #

*e*denotes the length of this set and

_{i}*ρ*(

*i*,

*j*) denotes the network distance between nodes

*i*and

*j*.

Let *f* denote the basic information-updating function, defined in the following way:
Thus, the function *f* calculates a weighted average of two values on a dial, *A* and *B*, with the weights being, respectively, 1 and *x*. Thus, this function is a simple weighted average, and the only reason for the complication above is that it is on a dial. (It is possible to define *f* using trigonometric functions; however, the form is less intuitive that way.)

Let *h* denote the information-updating function with two information sources, defined as a nested *f* function:
That is, the way the function *h* works is that first the function *f* calculates the weighted average of dial values *A* and *B* using the weight *x*, and dial values *A* and *C* using the weight *y*; and then the weight of the resulting two values is calculated using the weight *y*/*x*. (Note that due to the construction of the *y*/*x* weight, the parameters *x* and *y* do not have symmetric effect. This reflects the fact that *B* and *C* play different roles in the model, with *B* being the information the agent's meeting partner holds, and *C* being the third party information.)

Let *ϕ _{t,i}* denote the value of the information variable held by agent

*i*at time

*t*. Then, let TI ∼

*U*

*{*1, 2, … ,

*n*} denote a randomly selected agent that receives the true information,

*ϕ*

_{TI}∼

*U*(0°,360°). The initial information values are defined the following way: That is, all agents receive a random initial value on a dial, except for the agent that was chosen as TI, which receives the true information,

*ϕ*

_{TI}.

Let *π _{t,i}* denote the third party information, and let

*p*denote the index of the agent from which the third party information originates, both held by agent

_{t,i}*i*at time

*t*. The initial values for these two variables are set the following way: That is, the first third party information the agent receives from herself.

Let us define a synchronization event on a network *g*(*n*) ∈ *G*(*n*) as a series of *T* meetings among the agents, where a ‘meeting’ between two agents, *a* and *b*, is defined in the following way:
That is, first, the agents *a* and *b* are chosen such that they are linked to each other. Second, the agents exchange and update their respective information, third party information, and third party agent index the following way:
where *w _{a,b}* and are the weights associated, respectively, with the information agent

*a*receives from agent

*b*and the third party information she receives from agent

*b*.

These random meetings among the agents are repeated until *t* = *T*, where *T* = *τ n*/2. That is, on average each agent takes part in *τ* meetings.

Given the definition of the synchronization as a series of information exchanges on a graph, we can define a function that measures the average distance of the agent from the true information at the end of the synchronization process. Let *d* denote this measure in the following way:
where *w* = {*w _{i}*

_{,j}} and

*ω*= {

*ω*

_{i}_{,j}} are the respective weight matrices.

(Note that *d*(*n*,*w*,*ω*) is independent of the arbitrary parameter *ϕ*_{TI}. And thus, this structure is why we employ the cumbersome use of a dial, rather than a one-dimensional range for the information variable. At the same time, if the weight parameters of the agents and the network structure are well behaved, the average distance, *d*, converges to zero as *τ* goes to infinity.)

Let us assume that a set of *n* agents, linked up in the network *g*(*n*) ∈ *G*(*n*), goes through a series of *S* synchronization events in a such a way that both the agent selected to be TI and the true information stays the same throughout, while the information variables of all non-TI agents acquire a new random initial value in each of the synchronization events. (All elements of the synchronization event are as defined earlier.) Then, let *r _{s,t,i}* denote a memory record that agent

*i*collects at meeting number

*t*in the synchronization event

*s*, defined the following way: That is, if an agent is party to a meeting, she records all the relevant information in her memory.

Then let *R _{i}* denote the entire memory of agent

*i*, containing all her records for all of

*T*meetings in all of

*S*synchronization events: (For the initial values of

*w*

_{i}_{,j}and

*ω*

_{i}_{,j}used in the memory build-up, see below.)

The agents will select a sample from their memory, and choose a weight that will minimize their distance from the true information. Before we describe this process, however, we introduce a generic weight-optimization function. Let *z*(*Q*) denote the optimal weight given the sample *Q*, defined the following way:
A1where E is the expectations operator. That is, *z* is the coefficient that minimizes the distance of the weighted value from the true information. (We used optimization on a linear range rather than on a dial, as the latter results in impractically long calculation time. Robustness checks showed, however, that there is no significant difference between the two measures in practice.)

Given the definition of optimization in (A 1) let us consider three different models of how the agents sample their memory, and calculate the optimal weights. Let *F* serve as an index of these models.

In the first model, *F* = 1, the agents ignore all third party information and do not differentiate among the other agents they encounter. Formally,
and
and
where *q* denotes the size of the sample. That is, the agents only focus on primary information, and only differentiate self from not-self.

In the second model, *F* = 2, the agents still ignore the third party information, but they do differentiate among their partners. Formally,
and
and (as before)
That is, the agents still only focus on the primary information, but also use the information that lies in the differentiation between their partners. That is, by recognizing that their partners are different agents, they allow the weights they assign to their partners to reflect the difference in the quality of the information.

In the third model, *F* = 3, the agents use the third party information and differentiate among the other agents. Formally,
and (as before)
and

Let *c*(*F*,*q*,*n*,*g*(*n*)) denote the processor time, which is the time it takes for the simulation computer to perform the optimization defined in (A 1). The processor time varies with the model index *F*, the sample size *q*, the group size *n* and the graph (network) that connects the agents.

Using this concept of processor time, let us define an implicit distance function *δ* the following way:
A2where is the average measured processor time for all *g*(*n*) ∈ *G*(*n*). (That is, is a parameter that is measured during the simulation process.) Note that and *g*(*n*) allows the comparative measure, and thus the definition of the implicit distance function in (A 2). Also note that due to the construction of (A 2) the processor time *c* scales with the sample size *q* (although not linearly), allowing for a wider interpretation of computational complexity.

Using the *δ* function, we can define a maximum group size, denoted by *n**, as follows:
where *λ* is an arbitrary synchrony threshold. That is, *n** is the largest number of agents that can perform a group synchronization (on an average *k*-degree network) in a way that the group reaches a threshold of synchrony, given the calculation capacity and method of optimization of the agents.

### Proposition A.1.

*for all F* = 1, 2, 3.

### Proposition A.2.

*For both F* = 2,3 *there is a c**(*F*) *such that n**(*F* − 1, *c*) > *n**(*F*, *c*) *for* *c* < *c** *and* *n**(*F* − 1, *c*) < *n**(*F*, *c*) *for* *c* > *c**.

There is no algebraic solution to this problem, to our knowledge. However, a simulation result is obtained with *τ* = 20, *k* = 4 and *λ* = 11°. (Further computational parameters used were estimation limits in (A 2): −*L* ≤ *β* ≤ *L*, where we used *L* = 1000, and the number of synchronization events in the calculated memory *S* = 100. The initial values for the simulations’ memory build up were *w _{i}*

_{,j}= 1 and

*ω*

_{i}_{,}

*k*= 0 for

*F*= 1,2, and

*w*

_{i}_{,j}= 1 and

*ω*

_{i}_{,}

*k*= 1 for

*F*= 3, for all

*i*,

*j*, and

*k*. For robustness checks, see the electronic supplementary material.)

For the calculated evidence for both propositions A.1 and A.2, see figure 1. (Note that due to the nature of numerical simulation, the inequalities of propositions A.1 and A.2 hold only weakly for some parameter ranges.)

## Appendix B. A note on evolutionary dynamics

The model we present in this paper shows the group size limit of any network using the particular mode of information updating. One of the important questions concerning our model is whether there is a plausible evolutionary mechanism that would favour higher processing capacity on the individual level. Although our model is primarily concerned with constraints on the group size that would serve as limiting factors of any evolutionary process, we recognize that there might be a collective action problem (or public good problem) underlying the phenomenon we are modelling [45]. If the increased processing power is costly for the individual while aiding the group-level synchronization efficiency only marginally, then there might be an incentive for the individual to ‘cheat’ on the others by reducing its processing power, and thus free-ride on the others' higher capacities. If that was evolutionarily beneficial, there would be no group-level action at all [46].

Let us consider the set of graphs *G*(10) with *k* = 5, a group of 10 agents linked up in a random graph in such a way that each of them is connected to five others. Let us, for simplicity, assume that they are going through a series of synchronization events without any optimization, without differentiating among their partners and without third party information (i.e. *w* = 1 and *ω* = 0).

However, let us assume that the information reception of the agents contains noise in the following way:
where
and where
That is, all agents apart from agent 1 receive the information from others within a bandwidth of ±30. For agent 1, the error level can vary in the range *ɛ* ∈ (0,180).

Let us assume that, just as in the main model of the paper, there is true information received by one randomly selected agent. After *τ* = 30 information exchanges per agent, we measure the average distance of the true information from agent 1, as well as from all the other agents. We found that both agent 1's distance and the other agents' distance from the true information increases with the error level of agent 1; however, the former relationship is more steep (figure 2*a*).

Now let us consider a payoff function for agent 1 that has the following form:
B1where *p* is the total payoff, *a* is the payoff that agent 1 receives as a result of the collective action, *β* is a cost parameter and *n* = 10. Thus, the payoff function is of the following form: (payoff to focal individual) = (benefit of collective action) – (average group deviance) – (cost of cognitive investment) – (focal individual's deviation).

That is, the total payoff that the agent receives is dependent on the group being able to perform the collective action with efficiency, where the agent faces a cost for reducing her information's noise level, and she pays a penalty for being far from the true information. (Note that this is a particular example for a payoff function, where all the functional forms are linear, and the cost associated with group and individual efficiency is 1 in both cases. Trivially, the payoff from the collective action, *a*, can be any value, but unless *a* is big enough, *p* will not be positive and hence no action takes place.)

This formulation of the payoff function allows us to search for the optimal error level given the cost parameter:
Simulation results show that—as expected—the distance from the true information increases with the error level of the focus agent for both the focus agent and the rest of the group, albeit with a different slope (figure 2*a*). This results in the effect that agent 1's optimal level of error increases as the cost of noise reduction increases (see figure 2*b*). Importantly, in some parameter ranges, the optimal error is smaller than that of the rest of the group. This suggests that if the focus agent's error level was the same as the rest of the group initially, reducing the error level would be beneficial for this individual, providing the route to evolutionary dynamics.

Note that there are two reasons why the public good problem does not arise in the context of a coordination game, in line with the observation that public good problems are essentially two-trait in nature [47]. First, the public good in a coordination game is non-rival, which extends the range of the cost parameter (*β*), under which the public goods problem does not emerge. Second, and most importantly, investment in the public good and exploitation of the public good are one and the same thing. This means that the focus agent must invest in the public good (increase coordination efficiency) in order to gain from it at all (increase the agent's alignment). Hence, there is no possibility of acting as a freerider: there is no benefit to withdrawing from the public good, and the value of the public good is diminished as a direct consequence.

Although the functional form of the payoff function (equation (B 1)) is entirely arbitrary, this simple example illustrates an interesting property of the behavioural synchrony model: it can bypass (or perhaps even resolve) the public good problem, adding to the existing literature on the relationship between cognition and cooperation [48,49].

- Received May 7, 2013.
- Accepted May 28, 2013.

- © 2013 The Author(s) Published by the Royal Society. All rights reserved.