The structure of cross-cultural musical diversity

Tom Rzeszutek, Patrick E. Savage, Steven Brown


Human cultural traits, such as languages, musics, rituals and material objects, vary widely across cultures. However, the majority of comparative analyses of human cultural diversity focus on between-culture variation without consideration for within-culture variation. In contrast, biological approaches to genetic diversity, such as the analysis of molecular variance (AMOVA) framework, partition genetic diversity into both within- and between-population components. We attempt here for the first time to quantify both components of cultural diversity by applying the AMOVA model to music. By employing this approach with 421 traditional songs from 16 Austronesian-speaking populations, we show that the vast majority of musical variability is due to differences within populations rather than differences between. This demonstrates a striking parallel to the structure of genetic diversity in humans. A neighbour-net analysis of pairwise population musical divergence shows a large amount of reticulation, indicating the pervasive occurrence of borrowing and/or convergent evolution of musical features across populations.

1. Introduction

Human cultural traits exhibit an astounding myriad of forms, perhaps best exemplified by the approximately 6900 known languages currently spoken across the world [1]. Any approach to characterizing this cross-cultural diversity depends on the creation of a reliable classification of forms for a given domain of culture. There are many important examples of cultural classification, spanning from the seminal work of Murdock on the classification of over 100 categories of cultural behaviour across 1100 world populations [2] to contemporary examples in linguistics, such as the World Atlas of Language Structures [3] and the Austronesian Basic Vocabulary Database [4]. The primary goal of these kinds of classification systems is the identification of salient differences between populations, as these differences can aid in reconstructing the history of human population movements and cultural interactions [57]. A major criticism of these approaches, though, is that they place an exclusive emphasis on the diversity between cultures, downplaying or ignoring the internal diversity present within each culture. Overall, there is a dichotomy between comparative approaches—for which the goal is to characterize differences between cultures—and ethnographic approaches, for which the goal is to rigorously catalogue the richness of forms that exist within single cultures. Here, we propose a compromise solution that allows for the simultaneous consideration of between-culture and within-culture facets of cultural diversity.

The hierarchical structure of human cultural diversity is reminiscent of the structure of human genetic diversity in that this diversity can be compartmentalized into within- and between-population components. Population geneticists, starting with Lewontin [8], have repeatedly observed that the vast majority of the genetic diversity in human populations is found within populations rather than between them [9]. Some cultural scholars have argued that human cultures exhibit a much lower level of internal diversity than that seen in the genetic domain owing to processes such as conformity or frequency-dependent selection [10] that homogenize behaviours within populations and thereby push particular cultural variants to fixation [11]. While this is a plausible argument, no one—to the best of our knowledge—has performed a rigorous quantification of the hierarchical structure of cultural diversity. Perhaps the closest study is that of Bell et al. [12], which used internal behavioural variation to calculate cultural variation among populations using a population genetic model. However, this work did not explicitly quantify the degree of internal variation.

One requirement in applying population genetic models to cultural forms is the necessity that there be quantifiable features that vary among individuals or entities both within and between populations. For example, Bell et al. [12] used questions from the World Values Survey, administered to a sample of individuals from each focal culture. This is comparable to looking at variation among individuals at a particular genetic locus. Alternatively, if one wanted to investigate variation in some aspect of material culture, such as ceramics, then one would need a number of exemplars from each culture, appropriate features to describe these exemplars and a suitable quantitative measure of differences among entities. Clearly, there is a difference between studying variation among individuals in terms of behaviour and variation among entities of material culture. What is most important for the study of cultural diversity is that the unit of analysis and the means of measuring difference between cultural variants have domain-specific validity, and this must be worked out on a case-by-case basis for each domain of culture.

Music seems to satisfy these important requirements and thereby affords a novel opportunity to study the structure of cultural diversity. Not only is music a human universal [13] but also its form varies quite prominently both between [14] and within cultures [15]. Musical features are also quite amenable to comparative analysis [14]. Most importantly, for our purposes, the ‘song’ provides a reliable unit for the cultural analysis of music. Biologists interested in birdsong variation across time and space have indeed focused on the song as a unit of analysis ([16,17] and references therein). Ethnographic analyses of human cultures have also shown that the song represents the fundamental unit of both structure and function [13]. In addition, the song was adopted as the unit of analysis in the most ambitious comparative attempt to classify the world's musics, namely Lomax's Cantometrics project of the 1960s [14], in which more than 4000 songs from over 200 cultures were analysed and compared.

In order to make such a global project feasible, Lomax employed a small sample of only 10 songs per culture, and these were averaged into a ‘modal profile’ that represented the ‘typical’ song style for each culture [14]. While Lomax believed that his modal profiles were representative of the cultures he was sampling, ethnomusicologists studying musics from those same cultures questioned Lomax's findings because his approach strongly underestimated the degree of internal musical diversity in those cultures [15,18]. To date, there has been no quantitative method applied to music that retains the cross-cultural scope of Lomax's global framework while at the same time taking internal variation into account.

Exactly such a method is used in the study of genetic diversity in population genetics, and this method provides a promising approach for thinking about the hierarchical structure of cultural diversity as well. The analysis of molecular variance (AMOVA) is a method closely related to the analysis of variance that allows the hierarchical partitioning of genetic variance into components [19]. These components generally include variability within populations, variability between populations and variability between regional groups. The population structure being tested is defined a priori by the researcher, and can include divisions based on geographical region or language [19]. In its original application, AMOVA was designed to investigate molecular diversity based on haplotype restriction polymorphism data, but the generalizability of the method was recognized early on [19] and it has since been applied to many different kinds of genetic loci [20]. The flexibility of this method rests on the fact that variability is calculated as a measure of distance between haplotypes. The distance measure itself is defined by the user and can incorporate information about sequence evolution such as mutation rate [19]. Consequently, given an appropriate unit of analysis and distance measurement, this method can be extended to quantify the hierarchical structure of cultural diversity.

We attempt here for the first time to quantify both the within- and between-population components of cultural diversity by applying AMOVA to the analysis of musical diversity using the song as the unit of analysis. An important distinction here is that we are looking at populations of songs rather than populations of individuals. To this end, we focus on a rigorous sampling of tribal musics from Austronesian-speaking populations in Taiwan and the Philippines, itself part of a larger project devoted to prehistoric migrations in the region. To quantify musical variability, we calculate the distance between songs using a musical classification system we developed that is inspired by Cantometrics. The AMOVA framework is then applied to this data in order to apportion musical variability into within- and between-population components. We also measure pairwise population musical divergence with ΦST and use it in a ‘neighbour-net’ analysis [21] to explore the degree of reticulation in the data owing to borrowing and/or convergence. Distances based on ΦST are also compared with the corresponding modal profiles to test the accuracy of Lomax's modal profile approach for distinguishing differences between populations. Our novel application of AMOVA to cultural forms provides a general means of performing population-level cultural analyses while simultaneously addressing the internal diversity of cultural forms.

2. Material and methods

(a) Sample

The musical sample consists of 421 adult traditional group (choral) songs from 16 Austronesian-speaking aboriginal populations from Taiwan and the northern Philippines, comprising the Amis (30 songs), Atayal (10), Bunun (30), Paiwan (30), Puyuma (30), Rukai (30), Saisiyat (30), Tao (30), Tsou (22), Plains (Siraya) (24), Kavalan (18), Thao (30), Ibaloi (30), Ifugao (30), Kankanai (17) and Ayta (30). No song appeared in more than one culture's repertoire, and no preselection of songs occurred except that they be adult, traditional and group songs. Songs were obtained from commercial ethnomusicology recordings as well as from the Taiwan National Music Archive in Taipei [22] and the Centre for Ethnomusicology at the University of the Philippines in Quezon City. Thirty songs were randomly sampled from each population. For populations with less than 30 available songs, all recordings meeting our inclusion criteria were used.

(b) Classifying songs

P.E.S. coded all the songs by ear using the ‘CantoCore’ song-classification scheme developed in our laboratory [23]. This comprehensive scheme, modelled after Lomax and Grauer's original Cantometric scheme [24], codes 26 characters related to song structure, including rhythm, pitch, syllable, texture and form (see electronic supplementary material, figure S1).

(c) Quantifying musical distance

Either phylogenetic distances based on sequence evolution or phenetic distances based on sequence similarity can be used in genetic analyses [19]. Because we currently lack information about song evolution, we attempted to develop a simple phenetic measure of distance between songs, based on our codings, that is both musically and statistically valid. Leroi & Swire [25], as well as Busby [26], identified a number of methodological solutions to issues related to converting Cantometric song codings into distances, and these issues apply equally well to CantoCore. These include the presence of both ordinal and nominal characters, simultaneous coding of multiple states for a number of characters (multi-coding), the redundancy of some codings when certain states are absent and equal weighting of all characters. We built on their work to programme an algorithm that takes these issues into account while at the same time being flexible enough to handle a variety of coding schemes. The algorithm was programmed in R v. 2.12.2 [27] by T.R. and is available upon request. Details of the algorithm are found in the electronic supplementary material, section S2.

(d) Visualizing song relationships

In order to visualize songs in two dimensions, we performed non-metric multi-dimensional scaling on the song-level distances obtained from our algorithm using isoMDS in R, with 50 iterations and metric scaling as an initial configuration.

(e) Analysis of molecular variance analysis

Distances were prepared for the AMOVA by a Euclidean transform of the data using Lingoes's method [28], as implemented in the ade4 package for R [29]. The distances were then squared, as recommended by Excoffier et al. [19]. AMOVA was performed in arlequin v. 3.11 using the prepared distance matrix and standard settings [30]. Musical variability was apportioned ‘between’ and ‘within’ ethno-linguistically defined populations of songs [1]. The parameter ΦST is the proportion of total variability owing to differences between populations [19], and was calculated pairwise as a measure of musical divergence between populations. To test the significance of the between-population component of musical variance, we permuted songs randomly between populations using 1000 permutations.

(f) Neighbour-net analysis

Pairwise ΦST was used in a ‘neighbour-net’ analysis [21] to determine the level of reticulation in the data owing to borrowing and convergence. The analysis was performed in splitstree4 using standard settings [31]. All negative ΦST values were set to zero before performing the analysis [32]. All pairwise ΦST values were also normalized so that the average distance was 1, as in Gray et al. [33]. Average delta scores and q-residuals were calculated as a measure of overall reticulation in the network.

(g) Modal profile analysis

In order to test the efficacy of Lomax's modal profile approach at distinguishing differences between populations, we created a modal song coding for each population, consisting of the most common coding in its musical repertoire for each of the 26 CantoCore characters. This method best approximated the way Lomax created his ‘modal profiles’, but some of our resulting profiles contained incompatible combinations of codings. Rather than representing any one song in a population's repertoire in particular, some of these profiles were just a mixture of common musical features across a large sample of songs. These modal profiles are available in the electronic supplementary material, figure S3. Distances between modal profiles were calculated using the same algorithm applied to the original song data, giving us a population-level distance devoid of any information about internal diversity. These modal distances were then compared with the population pairwise ΦST measures using Spearman's rho (rs) and a Mantel test with 20 000 permutations.

3. Results

(a) Multi-dimensional scaling

Figure 1 shows a multi-dimensional scaling plot for the 421 songs used in our sample, colour-coded for the 16 tribes. The high level of stress (34.3) in this two-dimensional ordination indicates the complex multi-dimensional nature of the musical data. A scree plot did not reveal a clear elbow, and showed instead that our data would require more than eight dimensions to achieve an acceptable level of stress under 10. Despite this, the multi-dimensional scaling plot clearly demonstrates the high level of internal heterogeneity in each population's musical repertoire and the high degree of overlap between populations.

Figure 1.

Multi-dimensional scaling plot of distances between 421 songs from 16 Austronesian-speaking populations. There is a large amount of overlap between populations and spread within populations. Each point represents a song and is colour-coded according to population of origin.

(b) Song-level analysis of molecular variance analysis

The AMOVA analysis confirms the multi-dimensional scaling result (table 1), with a majority of the variance in our sample (approx. 98%) being accounted for by differences within populations and a smaller portion (approx. 2%) accounting for differences between populations. Despite accounting for a much smaller proportion of the variance, musical diversity between populations was statistically significant (ΦST = 0.021, p < 0.001).

View this table:
Table 1.

Musical AMOVA results.

(c) Neighbour-net analysis

The neighbour-net analysis (figure 2) demonstrated that our musical data did not appear tree-like and instead contained a fair amount of reticulation. The average delta score for this network was 0.46, and the average q-residual was 0.27.

Figure 2.

A neighbour-net plot of population-level musical divergence between 16 Austronesian-speaking populations based on pairwise ΦST from an AMOVA analysis of 421 traditional group songs. This plot shows a high degree of reticulation in the dataset, indicating the presence of borrowing and/or convergence (average delta score: 0.46; q-residual score: 0.27).

(d) Modal profile analysis

The pairwise population-level distances based on the modal profiles (ignoring internal diversity) were highly correlated with pairwise ΦST distances (rs = 0.730, p < 0.001), which take into account the internal variation in musical repertoires. This indicates that, although it cannot capture information about internal diversity within cultures, the modal profile approach may still adequately approximate the overall patterns of variation between populations (see electronic supplementary material, figure S4).

4. Discussion

We have applied the AMOVA framework to a cultural dataset, allowing us for the first time to quantify the hierarchical structure of cultural diversity. Our application of this approach to a sample of aboriginal Austronesian songs demonstrated that the vast majority of musical variation in this sample (approx. 98%) was found within populations, while a far smaller proportion of this variation (approx. 2%) occurred between populations. This validates and quantifies the critiques of ethnomusicologists that Cantometrics' cross-cultural approach underestimated the diversity of musical repertoires within each culture [15,18]. Next, a neighbour-net analysis of population pairwise ΦST distances showed that our musical data were not very tree-like, providing some preliminary insight into the evolution of musical repertoires and the presence of forces that diversify musics within cultures.

(a) How much diversity is sufficient?

The high level of internal musical diversity found in this study parallels general findings on the structure of human genetic diversity, with some estimates of this diversity being as high as 93 to 95 per cent globally, and as high as 99 per cent within some regions [9]. However, as in the genetic domain, this raises the important question of how much diversity is sufficient for describing differences between populations. This has been extensively addressed in population genetics. Lewontin's [8] analysis of human genetic variation led him to argue that the small proportion of variation found between populations in his study (14.6%) meant that differences between populations were not informative. Some scholars [34,35], most prominently Edwards [36], have noted that this conclusion is statistically inaccurate, as it ignores information contained in the correlation of allele frequencies across many loci. Modern clustering approaches use the correlated nature of genetic data to distinguish between major human groups that coincide with their geographical distribution, despite the small amount of variation (3–5%) accounting for these differences [9].

This situation is qualitatively the same in the study of musical diversity, because the correlation between different musical features in songs reveals much more about the unique musical repertoires of populations than the frequency of the features themselves. Therefore, our observation that between-population musical variance is a very small proportion of the total variance in no way precludes using this component for taxonomic and comparative analyses of world musics, as Lomax [14] did, or for the analysis of population relationships.

This kind of comparative methodology should not be applied recklessly but in consultation with expert ethnomusicologists, who can attest to the validity of the sample. The between-population component should be sufficient to distinguish populations musically, and this is validated by our modal profile analysis. The analysis demonstrated that a methodology devoid of information about internal diversity may represent overall patterns of difference between populations quite well, despite lacking the resolution to detect lower-level relationships.

(b) Cultural evolution of music

The transmission of cultural traits is distinct from that of biological traits in that there are many more possible modes of transmission. Unlike the human genetic domain, where variants are passed vertically across generations, features of culture can also pass horizontally between members of the same cohort, as well as obliquely from unrelated elder members of a focal individual's group [37]. The presence of alternative modes of transmission has been a central issue in the application of phylogenetic models to cultural traits [38]. Our preliminary attempts to apply such models to our song sample support Leroi & Swire's [25] claim that musical evolution is much less ‘tree-like’ than genetic evolution, owing to the occurrence of independent invention (convergence) as well as borrowing (horizontal transmission) of individual musical features (and even entire songs) between populations.

This contention is supported by the rather high average delta and q-residual scores obtained from the neighbour-net analysis. A recent analysis of typological and lexical data for a number of Austronesian languages is a good point of comparison for these figures [33]. Gray et al. [33] obtained average delta scores of 0.33 and 0.44 for networks based on lexical and typological data, respectively. From the higher delta score obtained in their typological analysis, they concluded that reticulation was much more common in typology than in the lexicon. By comparison, our musical data produced a value of 0.46, comparable with the score for language typology. This is consistent with the fact that our method is based on typological analysis of musical features.

This brings up the more general issue of the dynamics of musical evolutionary change. There are cultural forces that both diversify and homogenize musical repertoires, and some of them are conceptually analogous to forces that influence the dynamics of genetic change [39]. As with genes, cultural forms such as songs can undergo random changes over time, a kind of musical ‘drift’ [40]. Small population sizes may enhance the effects of genetic drift, although it is not yet clear how population sizes affect musical diversity and change over time. Another major force that can diversify repertoires is admixture through cultural contact (a kind of musical ‘flow’). Recent contact situations, such as that between the Paiwan and Rukai of Taiwan in our sample [41], can lead to high levels of acculturation, despite the maintenance of distinct languages. This particular contact situation is well reflected musically, with Paiwan and Rukai producing the only negative pairwise ΦST value in our analysis. This is unsurprising, as music actually provides an excellent model for ‘hybridization’ in the cultural domain, because it is composed of a series of modular components (mainly pitch and rhythm) that can undergo ‘syncretisms’ or blendings of features. A good example of this is found in African-American music, which contains a novel fusion of European tonal features and African rhythmic features [42]. Other cultural forces that can affect the frequency of cultural variants within and between populations include convergence, borrowing, innovation, conformity, extinction and replacement (e.g. through imposition, as in situations of conquest or economic globalization).

One means by which musical repertoires diversify internally is through a fissioning into an increasing numbers of genres or functional song types, a universal feature of musical repertoires. A classic example of genre-based variation in song structure is found in Arom's work on the music of the Pygmies of the Central African Republic [43], which qualitatively describes systematic differences in the musical features of songs performed in different social contexts, comprising roughly two dozen distinct musical genres (e.g. music for the hunting of elephants, music for the birth of twins). This is the same as for our Austronesian musical sample, with genres such as wedding songs and headhunting songs appearing in the repertoires of multiple populations. Unfortunately, the limited number of songs in the current study prevented us from doing any sort of meaningful genre-level analysis. It is plausible that some genres of song are less malleable or prone to borrowing, which could affect our results. Given a larger, more comprehensive dataset, the AMOVA approach could be used to explore how variability in genres is structured within and between populations.

Our work on the cultural evolution of music has important limitations, especially as related to our use of archival material. The reliance of our work on archival recordings highlights the difficulty in sampling the musical variation of indigenous populations in the modern age. One concern for the current work is that the kinds of songs represented in the archives that we used did not cover all of the genres of a population's musical repertoire owing to ascertainment bias. This, however, does not negate our major finding, as the inclusion of unrecorded music of other genres in our analyses would be most likely to have increased, not reduced, the internal diversity of the musical repertoires.

Archival recordings are essential in a world where globalization and the associated expansion of Western culture threaten to extinguish much of the rich cultural diversity seen in human populations across the globe [44]. This decline is reflected in the sheer proportion of living languages classified as vulnerable, endangered or critical, which is at least 27 per cent, according to a conservative recent analysis [45]. The dominant influence of Western music has led to non-traditional (Western) musical features being incorporated into indigenous musical repertoires through a kind of imposed hybridization. Archival recordings reduce the potential of encountering this form of unwanted admixture but are problematic in other ways.

In addition to the possible sampling bias discussed above, some archival recordings may be poorly documented, misclassified, non-traditional or of poor recording quality. We were fortunate enough to work with a very well-documented archive and to have received advice from an ethnomusicologist with expertise in the traditional musics of the Taiwan aborigines. This kind of work may be substantially more difficult in regions with less organized archives and where ethnomusicological expertise on these traditional musics is lacking. Despite the inherent difficulty in doing this kind of work, the task of characterizing and comparing worldwide musical diversity, as other scholars have performed with languages [4], is an extremely important endeavour, not least considering the current rapid rate of cultural extinction [45].

(c) How do these results relate to linguistic variation?

The neighbour-net diagram of population relationships for these tribes differs from the pattern expected based on analyses of the Austronesian languages [6,46]. In particular, we do not see a strong musical division between populations speaking Formosan languages (spoken exclusively in mainland Taiwan) and Western Malayo-Polynesian languages, spoken by populations on Orchid Island and in the northern Philippines. Instead, we see the Luzon tribes being interspersed with several of the southern Taiwanese tribes, in particular the Paiwan and Rukai. Moreover, these latter two tribes show far more relatedness to one another musically than is predicted based on linguistic analyses [6]. Therefore, the musical relationships among these tribes might be quite different than those based on language, regardless of the low proportion of between-population diversity found in the musical data. While the dynamics of musical evolutionary change are still poorly understood, it is possible that music is revealing different facets of population history than language.

(d) How generalizable are these results to other aspects of culture?

Many useful parallels have been drawn between cultural and biological evolution [47], but the forces shaping cultural diversity can differ markedly from those that drive the structure of genetic diversity [48]. For example, some have argued that cultural variants will necessarily always display less intra-population variation than will genetic variants [11]. Language is one of the best-cited examples of a cultural trait that is mostly variable between speech communities (rather than within), owing to strong constraints that ensure that members of a speech community can communicate with one another [10].

The relative strength of processes that reduce internal diversity and those that increase it is likely to differ across cultural domains. It is plausible that music, for example, may be subject to lesser constraints than a system like language, and that innovation in this domain may be more highly valued in some cultures. The current work only covers musical variation in a small number of populations within the same language family. Populations in other regions of the world may have much more homogeneous musical repertoires. However, our results demonstrate that a high degree of internal heterogeneity in a population's musical repertoire is a possibility, at least in some cases.

5. Conclusion

While the present-day structure of human genetic diversity has been rigorously quantified, we lack the same kind of quantitative information for most aspects of culture. The AMOVA framework provides cross-cultural researchers with a means of quantifying variability for a number of cultural forms, and of exploring the forces responsible for balancing diversity and conformity. The current work is by no means intended as a comprehensive sampling of worldwide musical diversity, and indeed the partitioning of musical variance may differ substantially in other regions of the world. We do, however, present a crucial tool that can be applied to many other aspects of culture—a tool that may be useful for the study of human migrations and associated histories of cultural contact.


This work was supported by a grant to S.B. from the Social Sciences and Humanities Research Council of Canada, and by an Amherst College Roland Wood Fellowship to P.E.S. We thank Yingfen Wang for providing expert ethnomusicological assessments of the aboriginal Taiwanese musics. We thank Tom Currie and two anonymous reviewers for helpful comments on earlier versions of this manuscript. We also thank Jean Trejaut and Victor Grauer for advice and support while conducting this research, as well as Michel Belyk for advice on programming.

  • Received August 19, 2011.
  • Accepted October 20, 2011.


View Abstract