Bentley *et al.* [1] present a spirited defence of the use of neutral models at the population level, and a demonstration that the three linguistic phenomena considered in our original article [2] can be captured through a different modification to the Wright–Fisher model than the one we considered. Their demonstration helps to reinforce our argument that these phenomena can be explained without the need to appeal to selection, showing that another simple neutral model can produce these effects. We see the key issues raised by their commentary as being whether there is any value in the novel connection that we identified between the Wright–Fisher model and cultural transmission by Bayesian agents, and whether this interpretation ‘runs the risk of obscuring the advances made both through careful modifications and wider applications of this powerful model’ (p. 1). We address these issues in turn.

To recapitulate our basic result, we showed that transmission of a probability distribution over a discrete set of alternatives by a sequence of Bayesian agents could be mathematically equivalent to the Wright–Fisher model [3,4], a classic model used in population genetics. Formally, each learner receives *n* observations of a set of variants (such as words or linguistic constructions), forms an estimate of the probability of each variant, summarized in a vector *θ*, and then generates *n* observations by sampling from this distribution. We showed that when the learners apply Bayesian inference with a particular prior distribution and choose the value of *θ* with highest posterior probability, the frequencies of the variants follow the dynamics of the Wright–Fisher model. We then used the connection to Bayesian inference to introduce a more flexible variant of this model, which we applied to the linguistic phenomena mentioned above.

We agree with Bentley *et al.* [1] that the Bayesian interpretation of this model is more complex than the original model, in which cultural transmission is described simply in terms of copying variants with some chance of ‘mutation’, by direct analogy to biological transmission. This complexity results from considering the cognitive mechanisms that underlie each cultural transmission event, and treating them as being more sophisticated than error-prone copying. Our interest in these mechanisms is partly a consequence of our bias as cognitive scientists, but also reflects our expectation that the mechanisms of cultural transmission are going to be different from (and potentially far more complex than) the mechanisms of biological transmission. In the case of the Wright–Fisher model, we believe that our mathematical analysis has several implications that make this extra complexity worthwhile.

As a general methodological point, we first note that mathematical results connecting different models are valuable not just because they provide new interpretations of those models, but because they extend the tools that are available for analysing them. In this case, we provide a link to a broader class of ‘iterated learning’ models that have been used to model language evolution [5,6]. In these models, a sequence of agents each hear a set of utterances, form a hypothesis about the language and then generate the utterances that are heard by the next agent. When the agents use Bayesian inference, the outcome of this process is well understood. In particular, if agents choose hypotheses by sampling from their posterior distribution, over time the probability an agent selects a particular hypothesis converges to the prior probability of that hypothesis [7]. This process can be shown to be a form of Gibbs sampling, a Markov chain Monte Carlo algorithm that is widely used in Bayesian statistics [7]. These results can potentially provide new insight into the Wright–Fisher model: standard asymptotic analyses of this model use diffusion approximations (see [8]), but establishing the link to iterated learning (and Gibbs sampling) indicates that there is a closely related class of models where the asymptotic behaviour is exactly known. If agents select the hypothesis with highest posterior probability, as is required to establish equivalence to the Wright–Fisher model, then iterated learning becomes equivalent to a different statistical inference algorithm, known as the stochastic expectation-maximization algorithm [7]. Again, asymptotic analyses exist for this algorithm (e.g. [9]), and showing that Wright–Fisher is an instance of this algorithm that has the potential to allow mathematical results to generalize in both directions.

Beyond these mathematical implications, providing a link to Bayesian inference establishes a connection between existing work using the Wright–Fisher model in cultural evolution and a growing literature in cognitive science on Bayesian models of cognition. While Bentley *et al.* [1] emphasize the recent disenchantment with rational models of decision-making in behavioural economics, there has been a parallel growth of interest in rational models of cognition in cognitive psychology [10–12]. Bayesian inference provides a way to answer a key question that comes up in describing human learning and memory, indicating how the expectations of an agent combine with the observed data to yield a conclusion. The prior distribution that is used in Bayesian inference expresses those factors other than the data that influence the conclusions that agents reach, including innate dispositions, schemas established through past experience, and prior knowledge about a particular situation. In formal analyses of learning, these factors are referred to as ‘inductive biases’ [13]. Developing models of cultural evolution in which the mechanisms of transmission take into account the inductive biases of agents is essential if we want to understand how the structure of individual minds influences the languages and concepts adopted by societies.

Relating minds and societies also provides us with an opportunity to use cultural transmission as a tool for investigating the inductive biases of human learners. By simulating cultural transmission in the laboratory, we can discover which languages and concepts survive, and thus answer questions about human learning. In fact, we recently used the connection between Bayesian inference and the Wright–Fisher model to address a question that arises in the literature on human language acquisition [14]. Children are known to ‘regularize’ languages—move towards a more deterministic rule-like structure, by decreasing probabilistic variation—and this is seen as a key part of language change [15,16]. However, it is not clear whether adults share a similar tendency, with some empirical results suggesting that adults ‘probability match’, producing utterances with the same probabilities as seen in their linguistic input [17]. This is a question about the inductive biases of human learners—the prior distribution that people have over probabilities in language. Our Bayesian interpretation of the Wright–Fisher model predicts how such prior distributions will affect the outcome of cultural transmission of linguistic frequency distributions. We used this connection as the basis for an experiment in which human participants simulated cultural transmission in the laboratory, learning which words to use to describe a set of six novel objects. Each object could be labelled using two different words, and the frequencies of these words varied, with each object being presented a total of 10 times. Each participant saw the objects paired with the labels with appropriate frequencies, then was shown the objects another 10 times each, being asked to produce labels for them. The results, shown in figure 1, indicated that adults do in fact have a bias towards regularization. This bias is weak enough not to be easily detected by analysing the productions of a single generation of learners, but is strong enough to influence the outcome of cultural transmission. Similar results for another linguistic construction were recently reported by Smith & Wonnacott [18].

Establishing that the Wright–Fisher model corresponds to a special case of cultural transmission by iterated learning with Bayesian agents opens the door to a deeper understanding of the differences between cultural and biological transmission. While biology provides us with a well-developed account of how information changes through transmission, and it is exciting to note cases where this seems appropriate for describing phenomena we observe in human societies, copying-with-errors is clearly incomplete as an account of how people learn from one another. This problem becomes salient when we consider richer, more structured knowledge that people acquire from one another, such as languages and concepts. Modelling learning as Bayesian inference gives us the tools to predict how cultural transmission will affect these forms of knowledge, and how inductive biases are involved. Laboratory simulations of cultural transmission of different forms of knowledge confirm the prediction that such knowledge will change to take forms that are consistent with human inductive biases [19]. An example is shown in figure 2, where a functional relationship between two variables changes to take the form that people find easiest to learn through a few generations of cultural transmission [19]. By placing the Wright–Fisher model in this broader framework, we can show that the mechanisms of cultural transmission are strictly more general than those of biological transmission.

Turning to the question of whether other modelling advances might be obscured, we see no reason why the Bayesian interpretation of the Wright–Fisher model we proposed presents a challenge to previous ‘population-level’ applications of this model. Providing a new interpretation for a model does not invalidate results that interpret it differently, and the structure of the Wright–Fisher model is sufficiently simple that we might expect to see similar dynamics resulting from many different mechanisms of transmission (including biological reproduction, copying-with-errors and Bayesian inference). However, it is important to recognize that these different interpretations turn primarily on the mechanisms of transmission, and not whether the model is applied to populations or individuals, as Bentley *et al.* [1,5] suggest. It is possible to define population-level models in which individual learners use Bayesian inference and obtain exactly the same mathematical results as for our individual-level model. For example, we might have a population of *n* agents, each of whom observes the variants generated by the *n* agents in the previous generation, applies Bayesian inference as specified in our model, and then generates a single variant by sampling from the resulting distribution.

Ultimately, we expect that progress in modelling cultural evolution will result from recognizing that different transmission mechanisms are going to be relevant in different contexts. The copying-with-errors (or innovations) mechanism emphasized by Bentley *et al.* [1] is reasonable in many of the settings in which it has been applied in their previous work, such as aesthetic choices about baby names or ceramic styles [20,21]. Bayesian inference is appropriate for describing more complex instances of learning or memory, as supported by the use of these models to describe such processes in cognitive science. These mechanisms happen to converge in the Wright–Fisher model, but this simple model is just a starting point for exploring the processes that are involved in cultural evolution. We hope that this starting point is used to begin a deeper exploration of how models of individual minds can be connected to models that describe the behaviour of populations.

## Acknowledgements

This work was supported by grant numbers BCS-0631518 and BCS-070434 from the United States National Science Foundation.

## Footnotes

The accompanying comment can be viewed at http://dx.doi.org/doi:10.1098/rspb.2010.2581.

- Received February 15, 2011.
- Accepted March 4, 2011.

- This Journal is © 2011 The Royal Society