Few mammalian species produce vocalizations that are as richly structured as bird songs, and this greatly restricts the capacity for information transfer. Syntactically complex mammalian vocalizations have been previously studied only in primates, cetaceans and bats. We provide evidence of complex syntactic vocalizations in a small social mammal: the rock hyrax (Procavia capensis: Hyracoidea). We adopted three algorithms, commonly used in genetic sequence analysis and information theory, to examine the order of syllables in hyrax calls. Syntactic dialects exist, and the syntax of hyrax calls is significantly different between different regions in Israel. Call syntax difference is positively correlated to geographical distance over short distances. No correlation is found over long distances, which may reflect limited dispersal movement. These findings indicate that rich syntactic structure is more common in the vocalizations of mammalian taxa than previously thought and suggest the possibility of vocal production learning in the hyrax.

1. Introduction

Most studies that have analysed the vocalizations of birds [1] and mammals have used acoustic measures: examining differences and similarities in fundamental frequencies, formants and other measures of the time and frequency characteristics of the vocalization waveform. Substantial work on acoustics in cetaceans [2] has demonstrated social learning and vocal clans. Vocal communication in bats [3], which have also been studied extensively, and some other terrestrial mammals, such as marmots (Marmota flaviventris) [4] and mice (Mus musculus) [5], have shown varying levels of complexity and information transfer in their calls. Various primates [68] use different calls in different contexts, and even simpler calls such as roaring in red deer (Cervus elaphus) can accurately advertise male size and fitness [9]. In addition to the information carried by the vocalization (e.g. the quality of the caller), calls can vary spatially and temporally. Considerable geographical variations in bird song (i.e. dialects) have been widely demonstrated [10], but only rarely shown in mammals, such as prairie dogs (Cynomys gunnisoni) [11], chipmunks (Eutamias spp.) [12], marmosets (Cebuella pygmaea) [13] and bats (Phyllostomus discolor) [14].

However, acoustic analyses do not take into account any significance of the syntax, or order of elements in a vocalization. If information is encoded in the order of different notes, motifs or other syntactic elements, then acoustic analysis cannot identify or extract such information. An alternative to acoustic analysis is to identify discrete elements, or ‘syllables’, of a vocalization, such as notes or characteristic sounds, and to examine their relative order in a quantitative way. Syntactic analysis provides additional information because acoustic features are likely to be constrained by anatomy, and therefore may be genetically inherited [9,15]. Syntax, on the other hand, could be genetic or cultural, and hence in some cases may be more flexible for cultural transmission via vocal communication.

Methods of syntactic analysis have rarely been applied to mammals, although such methods are common in the analysis of bird song [16]. Some bats show a syntactic vocal repertoire that is of a complexity similar to that of birds [17]. Cetaceans also exhibit complex vocal communication [2], including syntactic structure [18]. Some primates have been shown to combine call notes in a simple syntactic structure [1921]. It has been suggested [22] that syntax exists only in those mammalian taxa such as cetaceans and primates, subjectively considered to have ‘well-developed cognitive abilities’, or those living in a dominantly acoustic environment (in the dark or under water), where other communication modalities such as vision are impractical [23]. However, it is hard to find objective measures of cognitive ability [24], and many group-living animals might also be expected to exhibit complex vocal communication [25] in order to maintain social hierarchy, display fitness for mating and convey to kin contextual information, such as food availability and predator threats.

The rock hyrax, Procavia capensis, is a small (approx. 3 kg) terrestrial social mammal, widespread across Africa and the Middle East, and commonly found in rocky outcrops across Israel [26]. Male hyraxes produce long, complex songs, lasting up to several minutes [27,28] (see electronic supplementary material for an example), which carry accurate information on the characteristics and identity of the caller [29]. A hyrax song typically consists of a series of ‘bouts’, each bout being a sequence of ‘syllables’, followed by a short pause. The repertoire of available syllables is not large, and they can be grouped into five categories (based on [27,30]): ‘wail’, ‘chuck’, ‘snort’, ‘squeak’ and ‘tweet’ (figure 1). Each bout usually consists of up to 30 such syllables. The purpose of male hyrax song is currently unclear, but it appears to be a form of self-advertisement [31], because higher-ranked males (both group and peripheral males) sing more frequently [28]. In this sense, it is analogous to bird song. Although higher-ranked males carry out the majority of the singing, hyrax social structure is complex and other males also appear to play a significant role in the social activity of the group [32] (see electronic supplementary material, figure S1 for a typical social network in a hyrax colony).

Figure 1.

Spectrographic representation of five of the typical types of hyrax syllables: (a) wail, (b) chuck, (c) snort, (d) squeak and (e) tweet.

Since hyrax songs can be represented as a string of discrete syllables, they are amenable to analysis by techniques developed in other fields for the processing of digital information. In particular, bioinformatics uses algorithms for the analysis of DNA sequences, which can be adapted by aligning and comparing the sequence of syllables in a hyrax song in a similar way to the sequence of nucleotides in DNA. Information theory for digital signal processing has generated a number of metrics for measuring the information content in putatively random streams, using entropy-based measures. These can similarly be applied to the sequence of syllables in a song bout, and have been used to examine the information content in whale songs [33] and in frog calls [34], to relate bird song complexity to environmental factors [35], and to measure individual variability in bird song [36]. Based on the above approaches, we adopted algorithms commonly used for DNA and information theory analyses as novel tools for the analysis of syntax in hyrax songs.

We chose the Needleman–Wunsch (NW) algorithm [37], which uses dynamic programming to find the minimum number of insertions, deletions and substitutions required to convert one string of symbols into another. The NW algorithm has the advantage that it directly compares two strings, and unlike syllable frequency metrics, does not rely on large population sample sizes.

Mutual information (MI) [38] quantifies the amount of common information between two streams, and not just the similarity between them. A higher MI is produced when two bouts are similar, but also when the bouts are more complex. This has the advantage of not biasing the similarity measurement in favour of bouts that simply repeat a single syllable. In addition, MI is unrelated to NW difference and therefore provides a second independent measure of song similarity/difference.

Finally, we used a third independent test for the existence of song dialects in different regions of the country. Rogers's [39] scaled Euclidian genetic distance (DR) is calculated by comparing the allele frequencies at multiple loci, but we adapt it by using the frequencies of each type of syllable at each position in the bout. This allows us to compare song syntax at the population level, rather than comparing individual songs.

We examined differences in the order of syllables in hyrax vocalization between different sites around Israel. If hyraxes either learn or inherit song elements from nearby individuals, we hypothesize that the geographical distance between sites and the quantitative difference between songs at those sites will be positively correlated. We test for this correlation using the NW and MI metrics. The null for this hypothesis predicts that the NW and MI metrics will show no correlation between geographical distance and song difference over short ranges. However, hyraxes are clearly not as mobile as birds, and although little is known about hyrax dispersal distances, observed dispersal of a few hundred metres is common in Tanzania [40]. Long-range dispersal has only been demonstrated in one related genus [41], and only among females (which do not sing). We recorded a maximum dispersal distance of approximately 5 km (A. Ilany & E. Geffen 2007, 2011, personal observation). Consequently, we hypothesize that distant populations will be culturally and genetically isolated, and form dialects due to cultural and/or genetic drift. To test for the presence of dialects, we calculated DR among sites and tested the null hypothesis that DR variation among sites is not different from that expected at random.

2. Methods

We sampled hyrax songs in nine regions around Israel, where regions contained between two and nine sites (electronic supplementary material, figure S2; table 1). Sites within each region were ecologically similar (table 1), and were sufficiently near each other so that hyrax migration would be feasible (about 5 km; A. Ilany & E. Geffen 2007, 2011, personal observation). Because higher-ranked males are those who carry out the majority of the singing activity [28], songs recorded from each site were from one or at most two individuals, except for songs from Ein Gedi, where all males are individually marked [27]. To exclude the possibility of recording the same individual at two locations we made recordings at nearby sites on the same day. When more than one animal sang during recordings, we used our directional microphone to make sure that the strongest recording was from a single individual, and our analyses made use of only that individual's song.

View this table:
Table 1.

Regions in the study, with the number of sites (Nsite), mean number of bouts per site (Nbout) and the mean distance between sites within each region (Dsite, km). Γs is the Spearman rank correlation between NW/MI and geographical distance. Habitat is indicated by: S, suburban or urban; M, Mediterranean scrub; G, gorge; O, oasis; D, desert.

Songs were recorded onto a Sony TC-DM5 cassette recorder using either an Audio-Technica ATR-6550 or a Sennheisser ME-67 shotgun microphone. Singing was elicited by the playback of a recording of hyrax pup distress calls, as used in previous work [31]. The same pup recording was used at each location. In general, it was not possible to identify which individual was singing, except at sites where hyraxes were tagged as a part of other studies. Recordings were digitized using the audio input of a personal computer running Microsoft Windows. All additional analysis was performed in Matlab v. 7.3 (The Mathworks, Inc., Natick, MA, USA).

Songs were divided into syllables by visual inspection of the spectrogram, and bouts were defined as a sequence of syllables bounded by a period of silence of at least 1.3 s; this cut-off was determined by examining the distribution of inter-syllable gap lengths (electronic supplementary material, figure S3). We analysed 201 songs, which included a total of 2931 bouts. We classified the syllables into the five different types described above (following [27,30]), using a combination of automatic and manual methods (see electronic supplementary material).

As a hyrax typically begins a song with very short bouts, then adding more complexity as the song progresses [42], we excluded very short bouts with less than six syllables. A trade-off was necessary between selection of bouts with more information (longer bouts) and inclusion of a large number of samples (shorter bouts). We chose the minimum bout length to be six syllables because bouts of this length were present at every location sampled, thereby ensuring that every location was represented in the analysis.

We implemented the calculation of the NW distance in Matlab v. 7.3. The NW algorithm first aligns two sequences to minimize the cost of changing one sequence into the other using insertions, deletions and substitutions (figure 2a,b). The general form of the NW algorithm requires a cost matrix indicating the relative penalty for each of these operations, but as we have no indication how hyraxes perceive the difference between songs, we chose to give insertion, deletion and substitution equal cost penalties. The result is that our NW metric simply counts the number of differences between the two strings. We calculated the NW metric for each pairwise comparison of bouts in our dataset.

Figure 2.

Examples of NW comparisons. The NW difference is calculated by counting the minimum number of pointwise differences between the two strings. (a) Two unaligned strings with an NW difference of 7. The NW algorithm aligns the strings as in (b) to minimize the NW difference. (c) Two hyrax bouts which are highly different, NW difference = 9, and (d) two bouts which are very similar, NW difference = 1. Letters indicate the different syllable types: W, wail; C, chuck; S, snort; Q, squeak; T, tweet.

We also implemented the calculation of MI in Matlab, according to Cover et al. [38]. The MI I(A,B) between two streams A and B is defined as I(A,B) = H(A) + H(B) − J(A,B), where H is the Shannon entropy of a stream and J is the joint entropy of the two streams. Shannon entropy H is defined as Embedded Image, and joint entropy J as Embedded Image, where p(x) is the probability of syllable x occurring in a stream and p(x,y) is the probability of two syllables x and y occurring at the same point in the two streams. We also calculated the MI for each pairwise comparison of bouts in our dataset.

To calculate DR, we calculated the frequency of each of the five types of syllables at each position (locus) in the bout, comparing bouts from one site with bouts from another. Unlike in genetic studies, not all loci were equally represented, since the bouts are not of equal length. Therefore, we scaled the frequencies by the number of occurrences of that locus. DR(a,b) was calculated asEmbedded Image where pij and qij are the frequencies of syllable type i at locus j in the two populations at sites a and b, respectively, nj is the number of syllables at locus j and M is the number of shared loci in the two populations. DR therefore is a matrix where each cell is a measure of the syllabic isolation between a pair of sites. Each site belongs to one of the nine regions described above, and so each pairwise comparison of sites was either ‘within’ a region (region(a) = region(b)) or ‘between’ regions (region(a) ≠ region(b)). We calculated ϖ, the variation in DR explained by differences between regions, asEmbedded Image

This approach is similar to non-parametric multivariate analysis of variance (MANOVA) [43]. We tested for significance by applying a permutation test [44] with 105 random permutations of the sites, to randomize the assignment of ‘within’ or ‘between’ regions.

For those regions with five or more sites (Yarden Harari, Yuvalim, Shekhanya and Korazim-Karkom), we used the FATHOM toolkit for Matlab [45] to perform a Mantel test for correlation between song difference and geographical distance. Song difference was tested both for NW difference and MI. The number of permutations used for calculating the p-value in the Mantel test was 105.

3. Results

Of the total 2931 bouts, 549 bouts (19%) contained six or more syllables and were used for this analysis. The number of bouts per site ranged from 1 to 57 (with a mean of 15). Regions varied considerably in the number of sites, number of bouts per site and the distance between sites in a region (table 1).

The NW difference, which measures the number of pointwise differences between two strings, ranged from 0.7 to 12.1 (where the units represent the number of changes/insertions/deletions), and average NW (±s.e.) was 3.556 ± 0.162 for sites within the same region and 4.621 ± 0.067 for sites between regions. However, permutation tests showed that sites within the same region were not significantly more similar to each other than sites between regions (p = 0.136).

The MI, which measures a combination of the similarity and complexity of the two strings, ranged from 0.10 to 0.58 bits, and average MI (±s.e.) was 0.317 ± 0.126 for sites within the same region and 0.440 ± 0.003 for sites between regions. Permutation tests showed that sites within the same region were significantly more similar to each other than sites between regions (p = 0.015).

Rogers's scaled Euclidian genetic distance DR, which is analogous to the genetic difference between two populations, varied between 0 and 0.11, and average DR (±s.e.) was 0.019 ± 0.001 for sites within the same region, and 0.021 ± 0.0005 for sites between regions. Permutation tests showed that sites within the same region were significantly more similar to each other than sites between regions (p = 0.046).

Figure 2c,d shows examples of actual call sequences with high and low NW. The mean NW was much lower when comparing bouts within a site (2.92 ± 0.37) than between sites (4.56 ± 0.09), and permutation tests showed that this difference was significant (p < 0.001). Precise repetition of bouts was not common; out of 549 bouts, there were 386 distinct bout sequences, 342 of which were recorded only once. Some bout sequences were more common, and one (a ‘wail’ followed by five ‘squeaks’) was recorded 72 times.

For those regions that comprised five or more sites (Yarden Harari, Yuvalim, Shekhanya and Korazim-Karkom), we performed Spearman rank correlation (with a Mantel permutation to test for significance) between NW and MI, and geographical distance (figures 3 and 4, and table 1). Each point in these figures compares a pair of sites and shows the mean NW or MI after performing a pairwise comparison of all the bouts between two sites. As predicted, we found a positive correlation between geographical distance and NW difference (figure 3), and a negative correlation between geographical distance and MI (figure 4). In one case (NW in Shekhanya), the correlation was not significant, and in one case (MI in Yuvalim) the correlation approached significance, but in all cases the sign of the correlation was consistent between regions. No significant correlation was found between geographical distance and NW difference, or MI, when comparing across all regions (figure 5).

Figure 3.

Relationship of mean NW difference and geographical distance between pairs of sites in those regions that comprised at least five sites: (a) Yuvalim, (b) Shekhanya, (c) Korazim-Karkom and (d) Yarden Harari. Standard errors are not shown on the graph for clarity, but ranged from 0.05 to 3.18 (median 0.28). The straight line represents the least-square trend.

Figure 4.

Relationship of mutual information (MI) and geographical distance between pairs of sites (a) Yuvalim, (b) Shekhanya, (c) Korazim-Karkom and (d) Yarden Harari.

Figure 5.

Relationship of (a) mean NW difference and (b) MI between pairs of sites in the study as a whole.

4. Discussion

The significant differences measured by DR for different regions indicate the presence of distinct syntactic dialects between distant regions across Israel: hyraxes in different regions of the country sing a repertoire of songs that is substantially different from the syntactic repertoire in other regions. At short ranges (less than 5 km), we see a correlation between NW difference/MI and geographical distance. Among nearby sites, there is a trend of increasing NW difference and decreasing MI with increasing distance. Although the Mantel test significance p-values are not all less than 0.05, the consistent trend at different sites, and particularly across the two unrelated measures (NW difference and MI) strongly suggests that the order of song elements diffuses over a range of a few kilometres.

However, we do not observe a consistent trend of increasing NW difference or decreasing MI at larger geographical distances. This suggests that although hyrax song syntax is correlated between nearby individuals and groups, isolated syntactic dialects are in themselves arbitrary—as likely to be similar between distant regions as between nearby ones. The lack of correlation between NW difference or MI and geographical distance on a regional scale may indicate that little transfer of information exists (whether by social, or genetic, or environmental mechanisms) at long ranges. Other than geographical distance, there are no obvious physical, abiotic or biotic barriers to dispersal of hyraxes that can explain syntactic variation within and between regions [46]. This is consistent with our understanding of the limited dispersal of hyraxes, and stands in contrast to correlations observed in some bird species, where long-range dispersal is commonplace [47].

Wiens [16] found similar results to ours in a study of song-pattern variation in the sage sparrow using a syntactic analysis. Nearby sites showed a gradient of similarity, which was not observed over longer ranges, although distant populations showed significantly different repertoires. Farabaugh et al. [48] also found distinct syntactic differences between the songs of populations of Australian magpies. Similar studies of call syntax among mammals are very rare, and geographical dialects have been demonstrated mostly with acoustic rather than syntactic features. Campbell [49] and May-Collado & Wartzok [50] used inflection points in the spectral contour of dolphin whistles to compare geographically distinct populations, and Bohn [17] used a Markov model to quantify the syntax of bat syllables, which is probably the closest technique to ours. Some studies have analysed simple syntax in primate vocalizations by comparing the transition frequencies between notes [20,21]. Our novel use of algorithms taken from bioinformatics and information theory provides simple tools for a detailed analysis of vocalization syntax, and provides additional information on the temporal structure of songs (and potentially on the information content encoded in that syntax) that cannot be obtained using existing acoustic measures. Previous works on animal syntax have used Markov models [5,17] and machine learning techniques [18] to capture the nature of element ordering within songs. Our preliminary investigations indicated that a first-order Markov model was insufficient to represent the richness of syntax in hyrax song. Machine learning algorithms suffer from the disadvantage of being a ‘black box’ (i.e. their output does not expose any intuitive understanding of the relationship between the entities being classified). We chose to use methods such as NW and MI, which are easy to implement and interpret.

Syntactic dialects in hyrax populations can evolve and be maintained by social learning, copying and alteration by improvisation of the order of song elements (i.e. vocal production learning, VPL [23]). However, geographical variation is not definitive evidence for VPL [51], because syntactic dialects can also be genetically inherited (i.e. cultural versus genetic transmission). While it is easy to envisage that genetic factors could influence syllable frequencies or repertoire size, it is not clear what genetic mechanism could affect syllable order and syntax. Suboscine songbirds do not learn song syntax, but inherit their repertoire genetically [52]. However, in these birds, individuals do not show substantial variation in song syntax, but adhere to a species-specific song structure [53], in contrast to the hyrax, where substantial syntactic variation occurs between individuals and within regions. In rock hyrax, society, top-ranking males, which are often immigrants from nearby sites, sing more frequently [28]. We suggest that it is more likely that dispersing males carry song features from their natal group, which are then repeated and learnt by hyraxes at the destination sites. Imprecise copying or improvisation is a likely scenario for maintaining similarity gradients as we have observed along such dispersal paths (approx. 5 km). An additional support for VPL is the lack of correlation between male vocal profile (based on acoustic analyses) and their genetic relatedness within one site [29]. Further investigation is required to determine whether hyraxes are indeed capable of copying and generating novel vocalizations [54].

At present, we do not know if and what information is transmitted via syntactic structure. We know from previous studies that information on the caller identity and characteristics is stored in the frequency of some of the vocal elements (e.g. chuck element [27]). Our results suggest that complex vocalization syntax in mammals is present outside of cetaceans, bats and primates, and may be common in other mammalian taxa. The simple algorithms we adopted from bioinformatics, which we have shown to be powerful tools, may be used for analysing such syntax variation in other mammalian systems.

  • Received February 13, 2012.
  • Accepted March 27, 2012.


View Abstract