Royal Society Publishing

Does horizontal transmission invalidate cultural phylogenies?

Simon J. Greenhill , Thomas E. Currie , Russell D. Gray

Abstract

Phylogenetic methods have recently been applied to studies of cultural evolution. However, it has been claimed that the large amount of horizontal transmission that sometimes occurs between cultural groups invalidates the use of these methods. Here, we use a natural model of linguistic evolution to simulate borrowing between languages. The results show that tree topologies constructed with Bayesian phylogenetic methods are robust to realistic levels of borrowing. Inferences about divergence dates are slightly less robust and show a tendency to underestimate dates. Our results demonstrate that realistic levels of reticulation between cultures do not invalidate a phylogenetic approach to cultural and linguistic evolution.

1. Introduction

The only figure in Darwin's (1859) Origin of species is a tree. This figure represents Darwin's view of evolution as a process of descent with modification from a common ancestor. Since the publication of that book, there has been an ongoing debate about how evolutionary ideas can be applied to cultural and linguistic changes (Aunger 2000; Laland & Brown 2002; Carneiro 2003). The last few decades in evolutionary biology have seen the realization that ‘tree thinking’ (O'Hara 1988) is not just a way of describing evolution, but a way of testing evolutionary hypotheses. Evolutionary biologists have developed a powerful set of methods, phylogenetics, to answer evolutionary questions (Harvey & Pagel 1991; Huelsenbeck & Rannala 1997; Pagel 1999). The potential of phylogenetic methods for evolutionary analyses of culture has not gone unnoticed, with a recent proliferation of studies exploring everything from population movements (Gray & Jordan 2000; Holden 2002; Gray & Atkinson 2003; Rexovà et al. 2003, 2006; Greenhill & Gray 2005; Holden & Gray 2006; Gray et al. 2009) to the evolution of material culture (Tehrani & Collard 2002; Jordan & Shennan 2003; Darwent & O'Brien 2006; Harmon et al. 2006; Neff 2006; Tëmkin & Eldredge 2007; Coward et al. 2008), social organization (Holden & Mace 2005; Mace & Jordan 2005; Fortunato et al. 2006; Moylan et al. 2006; Jordan et al. in press) and broader questions about language evolution (Dunn et al. 2005; Pagel et al. 2007; Atkinson et al. 2008).

The justification for cultural phylogenies is twofold. First, there are many similarities between biological and cultural evolution (Mace & Pagel 1994; Atkinson & Gray 2005; Mace & Holden 2005). Second, many of the questions asked by anthropologists are similar to those asked by evolutionary biologists, and thus can be answered using the robust statistical and inferential framework provided by phylogenetic methods (Gray et al. 2007). However, like other attempts to Darwinize culture, the phylogenetic approach has been highly controversial (Gould 1987, 1991; Bateman et al. 1990; Moore 1994; Bellwood 1996; Borgerhoff Mulder 2001; Holden & Shennan 2005; Tëmkin & Eldredge 2007). In particular, it is frequently argued that the horizontal transmission of traits is rampant in cultural evolution:

Human cultural evolution proceeds along paths outstandingly different from the ways of genetic change… Trees are correct topologies of biological evolution… In human cultural evolution, on the other hand, transmission and anastomosis are rampant. Five minutes with a wheel, a snowshoe, a bobbin, or a bow and arrow may allow an artisan of one culture to capture a major achievement of another. (Gould 1987, p. 70)

According to the critics, the borrowing of traits between cultures violates the assumptions of the phylogenetic method, and may thus lead to inaccurate results or even completely invalidate the approach. Instead of phylogenetic analyses of cultures, these critics argue for a ‘rhizotic’ view of culture where each society derives from multiple parents (Terrell 1988; Moore 1994; Terrell et al. 2001).

This highly polarized debate about the problems caused by cultural reticulation will not be settled by further hand-waving and armchair speculation. The central issue is not whether phylogenetics is appropriate on a priori grounds, but how robust phylogenetic inferences are to horizontal transmission. The specific questions that need to be answered are (i) what effect do different levels of horizontal transmission have on the accuracy of phylogenetic estimates and (ii) are the problematic levels of horizontal transmission common in real situations? In this study, we will answer these questions by using computer simulations to explore the impact horizontal transmission has on our ability to infer phylogenetic tree topologies and divergence dates accurately.

2. Material and methods

To investigate the impact of horizontal transmission on phylogenetic estimates, we simulated the evolution of languages under a natural model of language evolution—the stochastic Dollo model (SDM; Nicholls & Gray 2006, 2008). Simulations are a standard method for analysing the sensitivity of phylogenetic methods under a set of assumptions (Penny et al. 1992; Huelsenbeck 1995). First, we generated data by simulating the evolution of linguistic traits on two different tree topologies under varying degrees of horizontal transmission. Then, we evaluated how horizontal transmission affects the ability of these methods that do not account for horizontal transmission to recover the ‘true’ tree topology. Finally, we explored how borrowing affects inferences taken from the tree structure by attempting to estimate the age at the root of the trees, and how this varies from the root age of the true trees.

(a) Input trees

To act as our true histories, we constructed two topologically dissimilar trees from the existing language datasets (figure 1). The shape of a tree reflects the overall evolutionary process (Mooers & Heard 1997). It contains information about population history, including the pattern and rates of speciation and the amount of change between populations. The effect of borrowing is likely to depend on this shape. The first tree has a ‘balanced’ topology where most nodes have an equal number of daughter cultures and relatively long internal branches. By contrast, the second topology is less balanced with a chained pattern of descent and much shorter internal branches.

Figure 1

The ‘true’ tree topologies used to synthesize the data, showing branch lengths and calibration points. (a) Tree 1 has a root age of 5111.14 years and (b) tree 2 has a root age of 5013.12 years.

(b) Data simulation

Data were simulated on both topologies using the SDM of lexical evolution, implemented in the software package TraitLab (Nicholls & Gray 2006, 2008). This models the birth and death of traits along each lineage, as well as simulating the borrowing of traits between lineages (figure 2). A dataset is simulated under the SDM by starting with a language (comprising a set of traits) at the root of the tree. Languages change as they move along the lineages of the tree towards the tree tips by having new traits born and existing traits die. Traits are born at a birth rate, λ, and die at a death rate, v. In our simulations, λ and v were kept constant throughout. The value of v was chosen so that, along any lineage, a mean proportion of 0.2 traits will be lost per 1000 years. This proportion approximates the estimated rate at which trait classes from the 200-item basic-vocabulary ‘Swadesh’ word lists are lost (Swadesh 1952; Embleton 1986; Pagel & Meade 2006). At each branching point or node in the input tree, the trait sets entering the split are copied identically into both daughter languages. The traits do not interact with each other, and the same trait cannot be born independently into two or more lineages. Thus, if a trait is found in two separate languages, then it has either been inherited from a common ancestral language or been borrowed from one language into another. This is a realistic assumption as it is rare that two languages will independently invent the same word with the same meaning. At the end of the simulation, a data matrix of ones and zeros is produced, and can be thought of as representing the presence or absence of words in cognate sets for each extant language.

Figure 2

Demonstration of the SDM and the resulting binary output matrix. Data synthesis begins at the root of the input tree and moves towards the tips. The process is assumed to have been running infinitely in the past, and has generated the first language, containing trait 1. Trait 2 is born into this lineage. This is followed by a divergence event where the extant traits (1 and 2) are copied directly into both new lineages (Proto-AB and Proto-C). After this divergence event, trait 3 is born in the Proto-AB lineage, followed by the death of trait 1 in the top lineage (denoted by a cross). Trait 3 is then borrowed (denoted by a dotted arrow) into lineage C. The last divergence event leads to lineages A and B, where trait 4 arises in lineage B, and becomes borrowed into lineage A (dotted arrow). Finally, trait 2 dies in lineage A. At the end of the simulation, the lineages have the traits at the tips (e.g. lineage A has traits 3 and 4, while B has 2, 3 and 4). This can be recoded into a binary matrix as shown, to represent the presence or absence of the traits.

In the simulations, traits could also be borrowed from one language into another, at each instant, according to a specified rate. When a trait is borrowed from one language into another, an identical copy of the selected trait is duplicated into the recipient language, giving rise to a new cognate class in the recipient if the recipient does not already have this trait. If the borrowed trait is already present in the recipient language, the borrowing event has no effect. First, we allowed borrowing to occur globally, with all languages equally likely to receive a particular trait from any other contemporaneous language. This global borrowing scenario represents the somewhat unrealistic situation where all languages can borrow from each other. The second scenario we simulated was a ‘local borrowing’ constraint, where a language can borrow only from those lineages with which it shares a common ancestor within a specified time period. This makes the assumption that cultures are more prone to borrow from their neighbours and those that they have contact with (Nichols 1997).

We simulated data on both true topologies, under a global borrowing scenario and 1000- and 3000-year local borrowing scenarios. Within each of these scenarios, we simulated different rates of borrowing across a range expected to cover most feasible levels. Borrowing rates are relative to the death rate because the borrowing process increases rapidly at large rates. Values for the borrowing rate used in these simulations were 0 (i.e. no borrowing), 0.045, 0.224, 0.448, 0.672, 0.896, 1.344, 1.793 and 2.241. This translates to approximately 0, 1, 5, 10, 15, 20, 30, 40 and 50% of traits in a lineage being borrowed into another lineage per 1000 years. Note that, for a given language in the data, the proportion of observed traits that have entered its ancestry via borrowing is not simply related to the borrowing rate but is also dependent on the birth and death rates and the tree topology. To characterize the range of possible variation within each set of parameters, we replicated the analyses 10 times at a ‘realistic’ borrowing level of 15 per cent and at an ‘extreme’ level of 50 per cent. Analyses for other parameter settings were carried out once.

(c) Tree reconstruction

To assess the ability of phylogenetic methods to reconstruct the original input trees, we again used TraitLab (Nicholls & Gray 2006, 2008). TraitLab implements a Bayesian Markov chain Monte Carlo approach to reconstruct phylogenies and model parameters under the SDM without borrowing. Here, trees and model parameters are sampled from the space of all possible trees in proportion to their ability to explain the data under the SDM.

The death rate may be estimated as part of the fitting process. However, this requires that some constraints be imposed on the tree to provide time information. The constraints have the effect of narrowing the space of possible trees to only those trees that have the common divergence point of the specified languages within a specified time period. Such a practice is common in Bayesian phylogenetic reconstruction of real linguistic data, where historical or archaeological data can inform us about the possible time frames in which the splitting of languages is likely to have occurred (Gray & Atkinson 2003; Gray et al. 2009). We arbitrarily selected two groupings in each of the true trees to act as constraints: one subtending 3–4 languages and the other subtending 9–11 languages. These nodes were constrained to within 10 per cent of their true ages (figure 1), approximating the variation in date estimates used elsewhere.

Each analysis was run for 5 million generations, which was long enough to ensure the chains had converged. Trees and parameters were sampled every 10 000th generation to reduce autocorrelation between samples. The first 3 million generations were discarded as ‘burn-in’ where the chain had yet to converge on the most likely region of tree space and the trees are unduly affected by the model priors. This resulted in 200 sampled trees per analysis.

To compare the estimated topologies with the input trees, we used the quartet distance metric (Day 1986; Steel & Penny 1993), as implemented in the program QuartetDist (Christiansen et al. 2006). This calculates the number of different combinations of four taxa (languages) in both trees and is normalized by dividing by the total number of quartets for the tree. The normalized quartet distance ranges from 0.0 when the two trees are identical to 1.0 when all quartets are different. This distance was calculated from the true tree to each of the post-burn-in sample trees from all simulated analyses. To assess the potential biasing effect of the constraints, we constructed two star (unresolved) topologies, constrained them in the same way and calculated the quartet distance between them and the true topologies. TraitLab was also used to estimate the age at the root of each post-burn-in sample tree.

3. Results

The random trees have a normalized quartet distance of 0.703 (topology 1) and 0.662 (topology 2). If the normalized quartet distance of the estimated trees is between this value and 0 (a perfect match to the true tree), then the estimated topologies are closer to the true tree than predicted by chance. In what follows, we will take the percentage difference between the quartet distance of the estimated trees and the random trees as a measure of perturbation.

The normalized quartet distance (figure 3) shows that the trees recovered are extremely close to the true trees. Under the local borrowing scenarios, the analyses find trees very similar to the true trees, and the results are not markedly perturbed by the borrowing rates (average perturbation 0.71 and 7.58% for topologies 1 and 2, respectively). Topology 2 shows divergence from the input tree first, in the 3000-year local borrowing scenario (average perturbation=14.58%), while topology 1 is more robust to these effects (average perturbation=1.03%). The estimated topologies diverge more from the true topologies under the global borrowing scenarios, and this divergence increases relatively linearly with the amount of borrowing. However, even at their worst (perturbation in topology 1=53.22% and topology 2=56.61%), the quartet differences do not approximate of the random topologies (dotted lines in figure 3), demonstrating that the recovered trees are still recovering some of the true signal. The replicated analyses of the 15 and 50 per cent borrowing levels demonstrate mild variation between replicates (average variation in perturbation range in topology 1=10.37% and topology 2=13.74%). Overall, these results suggest that something very close to the true tree topology can be recovered by Bayesian phylogenetic methods, even under high levels of borrowing.

Figure 3

Mean quartet distances between the true tree topologies and the estimated topologies for both topology types across all borrowing levels ((a) tree 1 and (b) tree 2). The dotted line marks the quartet distance of the unresolved topology and the cross marks the mean quartet distance of the no-borrowing scenario ((i) local 1000, (ii) local 3000 and (iii) global).

To investigate whether the inferences would be robust under a quite different model of cognate evolution, we replicated the 1000-year local borrowing scenario for topology 2 under a one-parameter model with gamma-distributed rate heterogeneity (Pagel & Meade 2004). The average perturbation across the entire range of borrowing levels was 2.15 per cent, suggesting that our results are robust to different models of reconstruction.

The differences between the age at the root of the true trees and those of the recovered trees (figure 4) show a general increase with borrowing level, where estimated root times become younger as borrowing increases. This trend towards younger trees is unsurprising as borrowing increases the similarity between taxa, and therefore decreases branch lengths across the tree. This can be quantified by the percentage difference between the true age and the estimated root ages. The 1000-year local borrowing scenario is reasonably robust to this effect (percentage distance from true root age in topology 1=3.22% and topology 2=7.53%), with it becoming more noticeable in the 3000-year (5.22% topology 1, 32.54% topology 2) and global borrowing scenarios (41.80% topology 1, 39.23% topology 2). The global borrowing scenario shows a marked underestimation of the root time beginning at the 5 per cent borrowing level (1.31% topology 1, 6.43% topology 2). Topology 2 shows the same general patterns but with all estimates consistently underestimating the age of the tree. The greater difficulty in resolving topology 2's true age presumably reflects the difference in internal branch lengths between the trees. Topology 2 has shorter internal branches than topology 1. This makes the higher order structure in topology 2 harder to resolve, as well as increasing the probability that borrowing will collapse these nodes.

Figure 4

Mean reconstructed root time for each simulation under all borrowing scenarios ((a) tree 1 and (b) tree 2). The dotted line marks the true root age and the cross marks the root age under the no-borrowing scenario ((i) local 1000, (ii) local 3000 and (iii) global).

4. Discussion

In this paper, we investigated the impact of horizontal transmission on Bayesian phylogenetic tree construction methods. Our results show that phylogenetic inference is remarkably robust to even quite high levels of borrowing. The crucial question is, of course, what scenarios of horizontal transmission are most realistic. Languages and cultures are able to borrow only from those they have contact with. For this reason, we would suggest that the global borrowing scenario simulated here is the least plausible. Instead, our local borrowing scenarios approximate the more likely situation where horizontal transmission occurs between geographical neighbours. To place the scenarios in more concrete terms, the 1000-year local borrowing scenario would be equivalent to allowing borrowing between most of the eastern Polynesian languages, such as Hawaiian and the Tuamotu Archipelago languages, or within the Slavic languages of Indo-European. By contrast, the 3000-year local borrowing scenario would allow borrowing between all of the Oceanic languages, or between the Baltic and Slavic subgroups of Indo-European.

First, what levels of borrowing are plausible? Most phylogenetic analyses of linguistic data have used modified versions of the kind of basic vocabulary lists originally developed for lexicostatistics (Swadesh 1952). Items on the Swadesh lists are generally less susceptible to borrowing and evolve more slowly than less frequently used vocabulary. As an extreme example, more than 60 per cent of the total lexicon of English is known to be borrowed from French and Latin. By contrast, only 6 per cent of English basic vocabulary is borrowed from Romance languages and only 16 per cent borrowed in total (Embleton 1986). Most published language/culture phylogenies have followed a procedure of removing obvious borrowings before analysis. Therefore, we suggest that a plausible range of undetected borrowing may be between 0 and 20 per cent. We found that over the range of 0 to 30 per cent borrowing, there were very few differences between the true and estimated tree topologies. We have taken the value of 15 per cent borrowing to be representative of the higher end of this spectrum. At 15 per cent under the 1000-year local borrowing scenario, the average quartet difference between the true topologies and the estimated trees is extremely close to that of the true trees (average perturbation in topology 1=0.27% and topology 2=0.73%). In the 3000-year local borrowing scenario, the found trees are still very close to the true trees (topology 1=0.65%, topology 2=6.76%), with only the second topology showing a slight increase in overall difference. The quartet differences vary within the replicated analyses at the 15 per cent level and some replicates are less affected by borrowing. At the 15 per cent level, this perturbation ranges from 0.03–0.59% (topology 1) and 0.25–2.36% (topology 2) at 1000 years, to 0.18–1.44% (topology 1) and 2.29–20.42% (topology 2) at 3000 years. Overall, the estimation of tree topology is remarkably robust, especially in the realistic borrowing scenarios.

Second, in contrast to the topology results, the root time estimates are less robust to the effects of borrowing. These show a general shift towards younger ages as the amount of borrowing increases. However, at the 15 per cent level, the root time estimates are reasonably accurate under the 1000-year scenario (average difference to true root time in topology 1=2.18% and topology 2=6.13%). At the 3000-year local borrowing scenario, analyses of topology 1 again find the correct root ages (1.74% difference), while those of topology 2 show a shift to younger ages (28.31% difference). This has implications for the accurate phylogenetic dating of historical events. In general, ages will be underestimated as borrowing increases. This may or may not be a problem depending on the hypothesis being tested. For example, there has been considerable controversy over the use of phylogenetic methods to infer the age of the Indo-European language family (Gray & Atkinson 2003; Atkinson & Gray 2006; McMahon & McMahon 2006). Gray & Atkinson support an older farming-based dispersal from Anatolia ca 8500 years BP rather than the ‘Kurgan’ hypothesis that dates this family to ca 6000 years BP. Our results suggest that if unidentified borrowing has affected these divergence time estimates, then the real age may be older than that suggested by Gray & Atkinson, making the Kurgan hypothesis even less probable. It is worth noting that the more informative the constraints that can be added, the more robust the date estimates will be. In the simulations reported in this paper, we used only two date constraints, while Gray & Atkinson used fourteen.

Third, the shape of the true history is important. Our results show that the balanced tree (topology 1) is more robust to the effects of borrowing than the unbalanced tree (topology 2). This suggests that specific patterns of divergence can have important consequences on accurate reconstruction and inference. We suggest that topology 1 may be more resistant to the effects of borrowing because the internal branches are longer, while the shorter internal branches in topology 2 require a less conflicting signal to perturb them. Further research into the effects of borrowing on different types of histories and speech community splitting events (e.g. Ross 1997) should elucidate this pattern further.

Fourth, the type of borrowing will affect the trees in different ways. This deserves some clarification, as some borrowing types may be more problematic than others. For the purposes of phylogenetic estimation, we can identify two major categories of borrowing: non-systematic and systematic. In non-systematic borrowing situations, cultures borrow a particular trait from other cultures they contact. This type of borrowing has also been called cultural borrowing (Bloomfield 1933). For example, English has borrowed the word ‘taboo’ from Tongan, but this borrowing has had little effect on the English language as a whole. As Goodenough (1997, p. 178) noted, the contact between America and Japan has led to borrowing of both culture and language in each direction, but has not diluted their ‘distinctive phylogenetic identities’. In this situation, borrowing does not introduce any systematic bias, but increases the noise level in the data. If the majority of transmission is vertical, then the strongest signal in the dataset will still represent the true phylogeny. If the methods are sensitive enough, then the chance of inferring the true tree is high even under non-trivial levels of horizontal transmission. Such borrowing will still bias time–depth estimates towards younger ages by making the cultural inventories more similar.

By contrast, in systematic borrowing, traits flow predominantly from one culture to another, introducing systematic biases in the data. As an example, the Formosan language, Thao, has borrowed large quantities of lexicon from the neighbouring language Bunun (Blust 1996). The borrowed words primarily relate to women (including binanau'az, ‘woman/wife’) and other traditional female roles such as cooking and child rearing. This was probably a result of Thao men marrying Bunun women and acquiring their specialized vocabulary for this specific semantic domain (Blust 1996). In other situations, systematic borrowing can be even more prevalent and complex. The Oceanic language Yapese is a paradigmatic case of complex historical influence with at least five different sources of vocabulary: (i) early Palau, (ii) an unidentified non-Oceanic language, sources both (iii) inside and (iv) outside the Nuclear Micronesian language subgroup, and (v) relatively recent imports from Woleaian and Ulithian (Ross 1996). This systematic borrowing will perturb the topology by drawing the interacting languages together, making them appear to be more similar. This will have the further effect of making any time–depth estimates shallower. However, these types of borrowing tend to occur within a small subset of the taxa being examined and will not necessarily affect other parts of the tree or the broader scale inferences. Often such borrowing primarily affects a certain subset of the data, such as the ‘women's work’ subset in Thao. In such cases, it is more likely to be identified (Dutton 1995) and can therefore be removed.

This systematic/non-systematic distinction explains the conflict between our results and those of McMahon & McMahon (2005). McMahon and McMahon found that a level of 10 per cent borrowing caused disruption of the true topology in the majority of their analyses. However, a closer examination of their method reveals a number of key differences to our simulations. First, their simulation allowed borrowing only between two of the terminal languages and only in the very final generations. This represents the type of borrowing we have called systematic. Second, the methods they used to recover the true tree are much less powerful—all the simulated data are converted directly to a distance measure before the tree search begins. This makes it impossible for the original sequence to be recovered and removes all the fine-grained information that likelihood-based Bayesian phylogenetic methods can use (Steel et al. 1988). Finally, McMahon and McMahon do not quantify either the uncertainty in their phylogenetic inferences nor whether this significantly affects inferences that could be made from the tree. In effect, the simulations by McMahon and McMahon estimate the effect of language hybridization on vintage lexicostatistics. It is unsurprising that this approach does not work well (Hoijer 1956; Blust 2000). By contrast, we have simulated the non-systematic type of borrowing, and used a Bayesian phylogenetic method to estimate trees.

To conclude, our results suggest that Bayesian phylogenetic methods are able to reliably estimate tree topology, even when confronted with substantial non-systematic borrowing. Root time estimates are less robust, showing a consistent trend towards younger ages as borrowing increases. However, this effect is not large at realistic borrowing levels. When there is very substantial horizontal transmission, there are strategies that can be used to untangle these complex histories (Gray et al. 2007). A method such as NeighborNet (Bryant & Moulton 2004; Bryant et al. 2005) that does not assume a tree-like history can be used to identify and visualize conflicting signal. These characters could then either be removed or explicitly taken into account with methods that can identify pattern heterogeneity, such as mixture models (Pagel & Meade 2004). Alternatively, the conflicting character histories could be decomposed into their own lineages and handled with a method such as reconciliation analysis (Page & Charleston 1998; Charleston 2003). In summary, while reticulation may be common in cultural evolution, it does not necessarily invalidate a phylogenetic approach.

Acknowledgments

We would like to thank David Bryant, Geoff Nicholls, Michael O'Brien, Robert Ross, David Welch and an anonymous reviewer for their discussion and suggestions.

Footnotes

    • Received December 23, 2008.
    • Accepted February 27, 2009.

References

View Abstract