Language is a hallmark of our species and understanding linguistic diversity is an area of major interest. Genetic factors influencing the cultural transmission of language provide a powerful and elegant explanation for aspects of the present day linguistic diversity and a window into the emergence and evolution of language. In particular, it has recently been proposed that linguistic tone—the usage of voice pitch to convey lexical and grammatical meaning—is biased by two genes involved in brain growth and development, ASPM and Microcephalin. This hypothesis predicts that tone is a stable characteristic of language because of its ‘genetic anchoring’. The present paper tests this prediction using a Bayesian phylogenetic framework applied to a large set of linguistic features and language families, using multiple software implementations, data codings, stability estimations, linguistic classifications and outgroup choices. The results of these different methods and datasets show a large agreement, suggesting that this approach produces reliable estimates of the stability of linguistic data. Moreover, linguistic tone is found to be stable across methods and datasets, providing suggestive support for the hypothesis of genetic influences on its distribution.
The approximately 7000 languages currently spoken around the world  vary enormously not only in vocabulary but also in phonology, morphology, syntax, semantics and pragmatics [2,3]. This structural diversity  can be coded using a set of typological features, including, for example, the number of consonants, the use of voice pitch to convey linguistic information (tone; ) or the canonical order of subject and verb, all of which take specific values in each language. The relationship between structural variation and the so-called universals of language, and the related issues concerning the nature of the constraints governing this diversity, are hotly debated issues , but it is clear that both cultural evolutionary processes akin to those acting on biological systems, and factors pertaining to human perception, articulation, cognition and sociality play a major role [7,8].
It is generally accepted that our capacity for speech and language rests on species-specific genetic factors, but it is currently unclear how these might be language-specific . At the other end of the spectrum, it is also overwhelmingly clear that individual variation in language and speech, both normal and pathological, has strong genetic components, showing moderate to large heritabilities and confirmed by the recent characterization of various genes [10,11]. However, the possible influence of population-level genetic diversity on linguistic structural variation has not been systematically considered until the recent proposal that the distribution of linguistic tone is influenced by the population frequency of the derived haplogroups of ASPM and Microcephalin, two genes involved in brain growth and development . This influence is hypothesized to be mediated by a genetic bias in the acquisition and/or processing of linguistic tone—a bias that is weak at the individual level but amplified by the cultural transmission of language in populations across generations . Such a proposal is supported by several computational and mathematical models showing that small biases can indeed be made manifest by cultural processes [14,15], by experiments where chains of adult humans learning artificial languages produce strong systematicity [16,17], and by zebra finches recovering the species-specific song through iterated learning across generations .
In general, cultural processes operate on shorter timescales than genetic ones, with cultural change out-pacing genetic change [19,20], but this does not preclude the existence of extremely stable aspects of language [20,21]. This cultural stability can be owing to several factors, such as strong constraints generated by ecological constants, structured cultural systems with high-connectivity ‘core’ components and the frequency of use of parts of language [20,22]. The genetic-biasing hypothesis  thus proposes that the ‘genetic anchoring’ of cultural traits is yet another factor influencing their stability.
Linguistic tone is affected by regular processes of language change and contact: it can be gained or lost, it can complexify or simplify and it can be borrowed across language boundaries [5,23–25]. From a purely linguistic point of view, tone is just another aspect of language , and there is no a priori linguistic reason to expect that it would be very stable . However, if linguistic tone is indeed under genetic biasing, then it is expected that its dynamics would tend to correlate with that of the biasing genes. This, in turn, would result in tone being more resistant to ‘regular’ language change and more stable than other linguistic features. Unfortunately, the currently available data do not allow a more precise identification of the locus of this stability, which could result from various aspects of tonogenesis and/or tone loss. Another consequence of this hypothesis is that tone (and non-tone) languages should cluster geographically and be affected by contact and language shift, as the bias would favour both language-internal and contact-induced changes in the direction of the bias. However, this aspect is not explicitly tested in the present paper, but these areal effects of the biasing genes should still be manifest as a ‘genetic anchoring’ of tone. Future computational work must investigate the capacity of such phylogenetic methods to detect various types of stability owing to, for example, transmission across language shifts or slow change through vertical transmission.
Here this prediction concerning the stability of linguistic tone is tested using a phylogenetic approach inspired from biology . Such phylogenetic methods have been successfully applied to language data  and are beginning to be widely accepted as an appropriate methodology for certain types of linguistic questions. However, some unclear issues do persist, especially the adequacy of trees to model aspects of language history given the known effects of borrowing and language shift (highly similar to the issues of Horizontal Gene Transfer; ), the appropriateness of models of character change and the rooting of the language trees. Given these and several other issues specific to language [22,28], I have adopted a general strategy of using several methods, datasets, codings and data-analysis strategies, as follows.
First, the main interest is in comparing the stability of tone with the stability of as many linguistic features as possible. The World Atlas of Language Structures (WALS) database  is currently the most comprehensive typological resource available, covering 141 features and 2650 languages, but owing to high levels of missing data (electronic supplementary material, the datasets), I have selected a more limited but still comprehensive subset of this database. To control for the effects of the coding of linguistic features, I have generated two datasets: the original polymorphic coding, where each feature has a specific number of values, and a linguistically informed binary recoding which, for some features, resulted in two related but distinct ‘aspects’. For example, tone as a polymorphic feature has three possible values (No tones, Simple tone system and Complex tone system), which can be collapsed meaningfully into two binary ‘aspects’: tone1 (No tones versus any type of tone system) and tone2 (Complex tone system versus anything else; electronic supplementary material, linguistic features and table S1).
Second, the hypothesis concerns global tendencies, averaging across the whole world and across different language families. Generally, historical linguists consider attempts at grouping together accepted language families into ‘macro-families’ [30–32] as very problematic [33–35]. Therefore, I have treated language families as independent phylogenies on which inferences of the rates of change have resulted in a set of posterior distributions of rates, one distribution per language family, for each linguistic feature. However, given the differences among historical linguists concerning the actual details of various language families, I have decided to use the two most comprehensive databases of such classifications currently available: the WALS  and the Ethnologue (; please note that these classifications are not independent: electronic supplementary material, linguistic classifications). In total, 41 language families have been used (electronic supplementary material, the datasets): Afro-Asiatic, Algic, Altaic, Arawakan, Australian, Austro-Asiatic, Austronesian, Aymaran, Cariban, Chibchan, Chukotko-Kamchatkan, Dravidian, Eskimo-Aleut, Hokan, Indo-European, Iroquoian, Khoisan, Macro-Ge, Mataco-Guaicuru, Mayan, Na-Dene, Nakh-Daghestanian, Niger-Congo, Nilo-Saharan, North Caucasian, Oto-Manguean, Penutian, Salishan, Sepik, Sino-Tibetan, Tacanan, Tai-Kadai, Trans-New Guinea, Tucanoan, Tupian, Uralic, Uto-Aztecan, Wakashan, West Papuan, Yanomam, and Yukaghir.
Third, to address possible artefacts introduced by the specific character model and rate estimation, I have used two Bayesian phylogenetic software packages: the widespread MrBayes 3  and the custom-written BayesLang. In both cases, the historical linguistic classification (either from WALS or Ethnologue) is taken as given and used to compute the rates of change of the linguistic features considered, but BayesLang explicitly considers it as rooted and deals with the unresolved nodes (polytomies) as such, while MrBayes needs an outgroup in order to root the tree and attempts to fully resolve it as well. For MrBayes, the linguistic classification was transformed into constraints on the admissible topologies and the binary data were considered as restriction, while the polymorphic as standard . BayesLang was run for 5 000 000 generations (1 000 000 burn-in, one cold and six heated chains) and MrBayes was run for 5 000 000 generations (1000 sampling frequency, 1000 generations burn-in) and all runs have converged (log-likelihood plots and the potential scale reduction factor; ). The methods used to estimate the rates differ markedly between programs, with BayesLang estimating the minimum number of changes from the inferred ancestral to the observed states, akin to a parsimony approach and producing an underestimate of the actual number of changes, while MrBayes estimated the rates of change using a gamma model (electronic supplementary material, stability estimation).
Finally, MrBayes requires an outgroup to root the phylogeny and there are many issues surrounding the choice of outgroups in general  and in linguistics in particular, owing mainly to the contentious issue of the above family-level relationships, compounded by non-vertical processes in language. To systematically investigate the potential influence the choice of outgroup might have on the rates estimated by MrBayes, I have selected 23 typologically and geographically very diverse language isolates to be used as outgroups (electronic supplementary material, outgroups). For the Ethnologue classification, I have used each of these isolates as the outgroup, therefore replicating 23 times each run for each relevant combination of parameters (see below). Owing to computational constraints and the very high correlation between the two classifications (see below), for the WALS classification I have used only two of these language isolates, namely the geographically and typologically distinct Ainu and Basque.
In total, I conducted 54 runs using multiple data codings, linguistic classifications, software implementations and outgroup choices (electronic supplementary material, the datasets), allowing the quantification and minimization of potential artefactual results. To ensure the comparability across such a diverse range of results, I converted the rate estimations for each linguistic feature into ranks, ranging from the most stable to the most unstable, resulting in posterior distributions of feature stability ranks. The analyses presented below take these ranks as their input.
(a) Summary properties of the feature-rank posterior distributions
Family-level distributions (not shown) are unimodal for most features and most language families, although several cases of bi- or multi-modality suggest problems with the data, the inference parameters and/or the adequacy of a tree model in these cases. The macro-area- and world-level combined distributions generally tend to feature multiple peaks, strongly suggesting different language family- and macro-area-specific processes. Nevertheless, there seem to exist systematic differences between features in their stability patterns.
Owing to these strong deviations from normality, the posterior rank distributions are summarized by both means and medians. However, independent of other parameters, the means and medians agree very strongly (0.94 ≤ r ≤ 0.98, p < 10−6), suggesting that the results are not artefacts of the data compression strategy.
(b) Effects of linguistic classification and outgroup choice
The two historical linguistic classifications (WALS and Ethnologue) produce very similar results across datasets (0.96 ≤ r ≤ 0.99, p < 10−10), suggesting that the differences in stability between typological features are not artefacts of these particular historical classifications.
The effects of the outgroup choice on the rates estimated by MrBayes seem to be minimal given that the feature rankings produced using Basque and Ainu as outgroups correlate very strongly (0.82 ≤ r ≤ 0.86, p < 10−10), independently of data coding and linguistic classification. This is reinforced by using all 23 language isolates (including Basque and Ainu) as outgroups for the Ethnologue classification and both codings, resulting in highly similar ranks: the first principal component across outgroups, PC1, explains 79 per cent of the variance, and the Pearson correlation ranges between 0.49 ≤ r ≤ 0.92, p < 10−6 with a mean r = 0.78. Given these high similarities, the posterior rank distributions produced by the different outgroups were combined and an ‘outgroup-average’ posterior distribution was extracted for further analysis.
(c) Binary coding
The following four datasets have been analysed: MrBayes with the WALS classification combining Ainu and Basque as outgroups, MrBayes with the Ethnologue classification combining all 23 language isolates as outgroups and BayesLang using both classifications. The feature stability ranks produced by these datasets agree strongly (mean r = 0.78, range 0.59 ≤ r ≤ 0.98, p < 10−8; electronic supplementary material, table S2), confirmed by a principal components analysis, where PC1 explains 81.4 per cent of the variance and represents the commonality between datasets (electronic supplementary material, table S3). Moreover, the average feature rank across these datasets correlates very strongly with PC1 (r = 0.99, p < 2.2 × 10−16) and classifies tone2 (complex tone systems; electronic supplementary material, linguistic features and table S1) as the 8th (out of 86), with tone1 (simple tone systems) as the 23rd (out of 86; electronic supplementary material, table S4).
(d) Polymorphic coding
There are four datasets mirroring the binary case and, as above, the stabilities agree strongly across datasets (mean r = 0.71, range 0.51 ≤ r ≤ 0.99, p < 10−5; electronic supplementary material, table S5), with the first component PC1—the commonality—explaining 76.1 per cent of the variance (electronic supplementary material, table S6). Again, the average rank correlates very strongly with PC1 (r = 0.99, p < 2.2 × 10−16) and classifies tone as the eighth most stable out of 68 polymorphic features (electronic supplementary material, table S7).
(e) The relationship between polymorphic and binary ranks
For any single polymorphic feature there can be more than one corresponding binary feature, capturing different linguistically relevant aspects (electronic supplementary material, linguistic features and table S1) and potentially having different stabilities (electronic supplementary material, tables S4 and S7). Because of this, the agreement between data codings varies dramatically across datasets and methods (0.18 ≤ r ≤ 0.99; electronic supplementary material, table S8), but tends to be rather strong (mean r = 0.61, median p = 6.5 × 10−9). PC1 (representing the agreement between rankings) explains 67.4 per cent of the variance, and PC2, explaining 16.1 per cent, contrasts the binary and polymorphic codings (electronic supplementary material, table S9). The average rank across both codings correlates very strongly with PC1 (r = 0.99, p < 10−15), while the average binary and average polymorphic ranks correlate moderately with one another (r = 0.58, p = 7.1 × 10−8).
The complex relationship between polymorphic and corresponding binary features is shown in figure 1 (see also electronic supplementary material, table S10). Tone is very stable as a polymorphic feature (figure 1b; one-sample t-test: t56 = 9.7, p = 1.35 × 10−13) and tone2 is very stable both as a binary feature (figure 1b; t70 = 12.04, p < 2.2 × 10−16) and overall (figure 1a; t70 = 12.27, p < 2.2 × 10−16), with tone1 relatively stable in both comparisons (figure 1a; t70 = 4.35, p = 6.7 × 10−9; figure 1b; t70 = 4.35, p = 4.5 × 10−5). Compared only with phonological features (electronic supplementary material, table S10), tone is stable as a polymorphic feature (5th out of 13; t12 = 2.4, p = 0.034), tone2 is very stable both as a binary feature (3rd out of 19; t18 = 5.1, p = 7.65 × 10−5) and overall (3rd out of 19; t18 = 4.73, p = 0.00017), while tone1 is of average stability as a binary feature (8th out of 19; t18 = 0.82, p = 0.42) and marginally stable overall (7th out of 19; t18 = 2.1, p = 0.052).
(f) The stability of types of feature
The WALS classifies the linguistic features into various categories and the issue of the relative stabilities of such categories is of appreciable interest to historical linguists and typologists [38,39]. Using these methods, there do not seem to be any significant differences between the stabilities of types of polymorphic features (one-way ANOVA: F6,64 = 1.55, p = 0.18), but types of binary features do differ (F6,64 = 6.05, p = 4.7 × 10−5; figure 1b), with Word Order on average more stable than Nominal Categories and Simple Clauses, and Phonology more stable than Nominal Categories (after Tukey's HSD multiple comparisons correction; ).
3. Discussion and conclusions
The work reported here represents, to my knowledge, the first investigation of the stability of structural properties of language using a phylogenetic perspective across a large set of language families and geographical areas, complementing the more focused study of Greenhill et al. . This approach takes into account language family-level uncertainty in the inferred parameters through the use of Bayesian phylogenetic methods, that generate whole posterior distributions for these parameters instead of point estimates [27,36]. Moreover, the specific design introduced in this paper attempts to control for other potential sources of artefacts, such as the inference and rate-estimation algorithms, the linguistic classification needed to guide the inference, the coding of the data and the choice of outgroup required to root the phylogeny. Therefore, 54 separate datasets have been analysed, each comprising a number of language families treated as independent phylogenies, resulting in the analysis of 113 246 phylogenies, each generating a set of posterior distributions of linguistic features' stability ranks.
The results show that, overall, these different methods, data codings, linguistic classifications and outgroup choices largely agree on the inferred ranking of linguistic features from the most to the least stable. This agreement suggests that, in this case at least, despite the various issues concerning the current application of phylogenetic methods to typological data, they produce reliable results.
Linguistic tone seems stable relative to a large set of features covering many aspects of language, both as a polymorphic feature and in its two binary aspects (being a tone language or not—tone1—and having a complex tone system—tone2), supporting one of the predictions of the genetic-biasing hypothesis . However, the validity of these results must be taken as suggestive, given the issues with the primary data (low coverage, coding decisions and chance similarity owing to the restricted range of typological features), linguistic classifications considered, the usage of language isolates as outgroups and the liability of typological features to areal effects. In this vein, Greenhill et al.  have found that typological features might be less appropriate than basic vocabulary to the application of phylogenetic methods and that these two aspects seem to evolve at comparable rates.
A more specific test of the genetic-biasing hypothesis would be to estimate the correlated evolution [41,42] between tone and the population frequency of the two biasing alleles, but this requires much better linguistic and genetic data and stronger assumptions concerning the adequacy of trees as models of linguistic and genetic relationships between populations. Another direction, currently underway, aims at operationalizing inter-individual variation in the acquisition and/or processing of linguistic tone and estimating the heritability of this variance  and its association with the two biasing genes . Recently, encouraging results suggest that common polymorphisms in these two genes are associated with normal variation in brain morphology [45,46], while Christiansen et al.  report an association between polymorphisms of ASPM and various language measures.
The method introduced in this paper suggests some intriguing patterns: FrRoundV (presence of Front Rounded Vowels) is extremely stable (2nd as polymorphic, electronic supplementary material, table S7 and 4th as binary, electronic supplementary material, table S4) and has a skewed geographical distribution . Likewise, MTPron (personal pronouns patterning as m in the 1st person and a coronal obstruent in the 2nd) and NMPron (n in the 1st person and m in the 2nd; ) are relatively stable (14th and 9th as polymorphic features, respectively), apparently supporting Nichols' claim  that these are indicators of very old demographic and linguistic processes. The reasons for the stability of such features warrants future investigation, as some might represent new cases of genetic biasing.
This paper has specifically focused on the global stability of linguistic features, but the variation observed within and between language families  and geographical areas requires further research as it might offer clues to language family- and area-specific processes. Especially interesting in this respect are the cases of multimodal posterior distribution of rates and the possible existence of concordances within geographical areas, which could point to large-scale contact phenomena and/or ancient language expansions.
Thanks to D. R. Ladd, F. Jordan, G. Hyslop, S. Levinson, M. Cysouw, M. Dunn, R. Gray, A. Dima and two anonymous reviewers for discussions and feedback.
- Received July 25, 2010.
- Accepted August 10, 2010.
- This journal is © 2010 The Royal Society