The learned songs of songbirds often cluster into population-wide types. Here, we test the hypothesis that male and female receivers respond differently to songs depending on how typical of those types they are. We used computational methods to cluster a large sample of swamp sparrow (Melospiza georgiana) songs into types and to estimate the degree to which individual song exemplars are typical of these types. We then played exemplars to male and female receivers. Territorial males responded more aggressively and captive females performed more sexual displays in response to songs that are highly typical than to songs that are less typical. Previous studies have demonstrated that songbirds distinguish song types that are typical for their species, or for their population, from those that are not. Our results show that swamp sparrows also discriminate typical from less typical exemplars within learned song-type categories. In addition, our results suggest that more typical versions of song types function better, at least in male–female communication. This finding is consistent with the hypothesis that syllable type typicality serves as a proxy for the assessment of song learning accuracy.
Songbirds learn their song repertoires from conspecifics, often with a high degree of precision . In many species, a consequence of vocal imitation is that songs within a population cluster into a large number of different categories called ‘song types’ [2,3]. Song types have been the basis of much research, but most hypotheses involving their role in communication assume that they are perceived in a categorical, all-or-none manner: a song is either perceived as belonging to a song-type category or not. In this study, we investigate whether birds respond differently to songs that are more or less typical members of their song-type category.
It is well established that humans can perceive the degree to which utterances are typical or ‘good’ exemplars of speech categories [4–7]. It is still debated exactly how this feat is accomplished: one possible mechanism is by learning ‘prototypes’, single summary representations of categories stored in long-term memory, while an alternative is by memorization of a large number of individual exemplars [8,9]. Less attention has been paid to similar abilities in other species. In one exception, starlings learned to discriminate typical from less typical exemplars of human speech categories, supporting theories that such behaviour arises from general learning processes that are shared across species . It remains unclear, however, what role category typicality plays in natural communication systems. There are many examples where there appears to be a single, unlearned optimal version of a signal, usually associated with species specificity (e.g. [10,11]). These cases are quite different from human speech, however, where each individual learns many different, somewhat arbitrary categories, each with its own optimal exemplar.
Here we test whether male and female swamp sparrows respond differently to songs that are either more or less typical of song-type categories within their population. Song types, like speech categories, are learned categories [2,3], and therefore provide a good system to explore whether learning about category typicality plays a role in natural communication systems. Previous studies of song recognition have tended to focus on discrimination of heterospecific or foreign conspecific song from local song (reviewed in , e.g. ), a level at which genetic predispositions may influence song preferences . By focusing at a much finer scale, and, critically, by investigating discrimination within song-type categories, we investigate how much swamp sparrows learn about their cultural environment. Swamp sparrows are a good model for this work because the relatively simple structure of their songs, a single syllable repeated as a trill (figure 1), facilitates quantitative comparisons of similarity and because both females and males are known to discern fine features of song in the functional contexts of mate choice and aggressive signalling, respectively [14,15]. Individual male swamp sparrows sing a repertoire of three learned syllable types on average [16,17]. Because a particular syllable type may be shared among many males in a population , young individuals may hear multiple adults singing the same type. This exposure to multiple versions of the same syllable type may shape the songs that young males learn to produce, and may also allow both male and female receivers to discriminate more from less typical versions of the type.
2. Material and methods
(a) Stimulus selection
We recorded the song repertoires of 206 adult male swamp sparrows from Conneaut Marsh, Crawford County, Pennsylvania, USA, between 5 May and 1 July 2008 (Sony parabola, Shure SM57 microphone, a Sony PCM D50 digital recorder, 44.1 kHz sampling rate, 16 bit dynamic range). We estimate that this sample represents 20–30% of the territorial males in the local population.
We selected one exemplar of each syllable type in each male's repertoire, generating a sample of 656 syllables. We compared each of these syllables with every other syllable in our sample using the dynamic time-warping (DTW) algorithm in the software package Luscinia (http://luscinia.sourceforge.net, see  for further details). This algorithm searches for an optimal alignment between two time series on the basis of the Euclidean distance between acoustic features; in our analysis, these features were spectrograph measures of syllables: time, fundamental frequency, fundamental frequency change and ‘vibrato amplitude’ (additional parameter settings in Luscinia: compression ratio: 0.25, minimum note length: 10, s.d. ratio: 0.5, cost for alignment error: 0.2, syllable comparison by individual element, with weight by amplitude and log transform of frequencies both selected.) This DTW analysis has been shown previously to generate comparisons between swamp sparrow song note types (i.e. the smaller units that comprise a syllable type) that match the subjective assessments of human observers as well as the categorical perception of swamp sparrows themselves . The output of the DTW analysis is a dissimilarity score between each pair of syllables in our sample.
We assigned syllables to population-wide types using unweighted pair group method with arithmetic mean (UPGMA) hierarchical clustering of dissimilarity scores from the DTW analysis, and the Global Silhouette Index  to decide where to cut the resulting tree into different syllable types. We compared this analysis to a visual assessment of spectrograms (by R.F.L.): both methods generated a similar number of overall types (61 for the computational analysis versus 65 for visual classification) and were highly concordant with an adjusted Rand Index  of 0.81.
We next quantified the degree to which each syllable was typical of its syllable type cluster. We applied Anderson's method  for measuring multivariate dispersion: first doing a principal co-ordinates analysis of the matrix of syllable dissimilarities, and then calculating the multivariate centroid of each syllable type cluster. We then calculated the Euclidean distance, dc, between each song and its syllable type centroid. A low value of dc means that a syllable is similar to the centroid of its syllable type and is itself therefore highly typical of its type.
We selected nine of the most commonly occurring syllable types in our sample (figure 1b). Each of the selected syllable types was sung by at least 19 males in our recorded sample and up to 57 different males (median = 47). For each of the nine types, we selected two pairs of exemplars; in each pair, one exemplar had a low-dc score (mean ± s.d. = 0.0293 ± 0.0079) and was therefore typical, and one had a high-dc score (0.0602 ± 0.0091) and was therefore less typical for the syllable type. None of our high-dc exemplars had the most extreme score for their type; the average dc score for the furthest outlier of all types sung by at least 15 males was 0.0766; the average dc score for the most typical member of each of those clusters was 0.0220. Our decision to use two categories of stimuli, rather than using natural, continuous variation in dc, was made to facilitate experimental design.
Previous work has found that swamp sparrows attend to song performance [14,15,23], and there are potentially complex interactions between learning accuracy and performance . In swamp sparrow songs, an operational definition of performance, ‘vocal deviation’, has been developed that includes syllable repetition rate and frequency bandwidth : low vocal deviation scores represent high-performance songs. To ensure that our test subjects were not discriminating between pairs of test songs based on vocal performance, we measured the vocal deviation of each exemplar (following the formula in ) and matched pairs of exemplars with performance scores within two units of each other (these are dimensionless units; 2 units represents a small difference in performance compared with previous studies, which found different responses to songs differing on the order of 14 units [14,15]).
Stimuli were constructed by selecting one syllable from an exemplar song, and repeating it to form a trill using the program Audacity (v. 1.3.9), maintaining the original inter-syllable gap until the song length reached 2 s, and with a reduction in amplitude for the first and last (−9 dB relative to central syllables) and second and penultimate (−3 dB) syllables such that all stimuli shared a similar amplitude envelope across the duration of the song. An error was made in constructing one pair of exemplars that was detected after the experiments were concluded; the results we report exclude these data leaving 17 stimulus pairs. Including the omitted data do not change the significance of the reported statistics.
(c) Male response experiment
Male tests were performed between 26 May and 10 June 2011, using 68 adult male swamp sparrows defending territories in the same Conneaut Marsh population from which we had recorded songs. Playbacks were conducted between 06.00 and 12.00 using an Apple iPod and an Altec Lansing imt620 speaker mounted 1.5 m above the ground, with songs played at 80 ± 2 dB sound pressure level (SPL). Each male was played one stimulus, for 4 min, at a rate of six songs per minute. Subjects were observed for the duration of the playback plus 4 min following the playback. The entire period was divided into 5 s intervals and the closest approach of the bird was recorded for each interval and binned into categories of 0–2, 2–4, 4–8, and more than 8 m . Distance estimation was aided by placing flagging tape at distances of 2, 4 and 8 m from the speaker prior to the experiment. The subject's distance score was calculated by averaging across the 8 min playback. We also recorded whether the subject flew over the speaker, sang or gave a wing-wave display, a visual signal that reliably predicts attack in swamp sparrows . To generate a measure of male response, we applied a principal components analysis to our scaled, Box-Cox transformed results, using the prcomp function in R .
(d) Female preference experiment
Subjects were 21 (n = 9 in 2010, n = 12 in 2011) adult female swamp sparrows, captured from the same Conneaut Marsh population from which songs were recorded. Subjects were housed in individual cages, each inside its own sound isolation chamber (Industrial Acoustics model AC-1), on a long day-length light cycle (15 L : 9 D).
We used a standard copulation solicitation assay to measure female preferences for song [14,23]. Seven days prior to testing we gave each female a subcutaneous implant of 17-β-estradiol in silastic tubing (1.96 outside diameter, 12 mm total length, 8 mm of hormone). During tests we played song stimuli at 78 ± 2 dB SPL through a Realistic 4 × 6 speaker (2010 trials) or an Alec-Lansing Orbit-MP3 speaker (2011 trials) mounted to the top centre of each test chamber.
We presented low-dc and high-dc versions of syllable types to the females in separate trials, with two song exemplars (both either low-dc or high-dc) presented during a trial to minimize habituation. Thus, each trial consisted of nine presentations of the first exemplar followed by nine presentations of the second, at a rate of six songs per minute for a total length of 3 min. We tested the females with low-dc and high-dc exemplars on the same day, with approximately 3 h between trials and with the order of presentation (i.e. low-dc or high-dc first) randomized. We tested each female again 2 days later with the same stimulus sets, also separated by 3 h, but with the order of presentation reversed. The subjects were visually and acoustically isolated during testing; we viewed and recorded their responses remotely on a computer monitor. The number of copulation solicitation displays (CSDs) that were elicited by playback songs was the sole response measure [14,23].
Male swamp sparrows responded more vigorously to low-dc (i.e. more typical) songs than to high-dc songs. They approached closer, made more wing-wave displays, sang more and flew over the speaker more frequently for trials in which low-dc stimuli were played back than for trials in which high-dc stimuli were played (figure 3a). The first principal component (PC1), of these response behaviours was weighted negatively with approach distance (−0.563), and positively with number of songs (0.388), wing-waves (0.640) and flights over the speaker (0.350); it explained 39.5% of the total variance and was taken as a general measure of response strength. Averaging within syllable types, PC1 was higher after playback of the low-dc songs than for the high-dc songs for eight of the nine syllable types (figure 3a; Wilcoxon signed rank test, n = 9, V = 44, p < 0.01). We used the MCMCglmm package in R  to fit a linear model to PC1 (with syllable type as a random factor) and found that dc category again significantly predicted aggressive response (95% confidence interval (CI) = 0.106–1.054, p < 0.02) but that song performance did not (CI = −0.130 to 0.089, p > 0.5).
Female swamp sparrows performed more CSDs in response to low-dc songs than to high-dc songs for eight of the nine syllable types (figure 3b; Wilcoxon signed rank test, n = 9, V = 42, p < 0.025). We used the MCMCglmm package  to fit a linear model to the number of CSDs as a binomial response (including year of capture, day, time of testing and stimulus order as fixed factors and subject and syllable type as random factors). This model found a significant effect of dc category (CI = −1.12 to −0.068, p < 0.03) and performance (CI = −0.27 to −0.003, p < 0.05), and indicated that females displayed more to low-dc songs than to high-dc songs, and more to high-performance songs than to low-performance songs.
An earlier study failed to find evidence that female swamp sparrows preferred accurately learned songs over less accurately learned songs , which may be seen as contradicting our results if poorly learned songs are less typical of song-type categories, as is probably the case. They assessed hand-reared males' learning accuracy based on the similarity between their songs and the specific tutor exemplars with which they were trained. The females used in that study were captured as adults from the males' natal population, and may or may not have been familiar with the specific tutor exemplars used in the experiment. Experimental males could have learned the tutor songs accurately, but from the perspective of the experimental females, their songs might still have been perceived as atypical relative to population norms if the tutor songs themselves were also atypical. We re-analysed the data from , measuring instead how similar the male subjects' songs were to the syllable type centroids in our wild-recorded sample, and found that females preferred songs which clustered with common syllable types in the population over outliers that lay outside category boundaries (electronic supplementary material, S1). These data provide additional support for our conclusion that females prefer songs typical of song-type categories.
The Cohen's d effect size in our male playback experiment was 0.47 (d = 0.57 when estimated from the MCMCglmm model). In our female playback experiment, there was an odds ratio of 1.19 (d = 0.59 in the MCMCglmm model). These results suggest that typicality has a moderate effect on swamp sparrows' responses to song. In our reanalysis of the earlier study of female responses , the effect size of typicality was an odds ratio of 4.23 (d = 1.30 in the MCMCglmm). This stronger effect size may reflect the fact that outliers in this earlier study were more extreme than the high-dc songs of our experiments (electronic supplementary material, S1). In line with this, a study investigating female swamp sparrow preferences for local songs over unfamiliar song types from a distant population (again using similar methods, and females from the Conneaut population) found an effect size (odds ratio) of 4.00 .
Swamp sparrows may also assess a second aspect of song: song performance. Two previous studies investigated male and female responses to high- and low-performance songs in the Conneaut Marsh population, using similar methods to those we employed [14,15], which we can therefore compare with our results. They used a common set of ‘high-’ and ‘low-'performance stimuli that were near the limits of the syllable type range. The Cohen's d effect size of male responses to song performance was 0.64, and the odds ratio of female responses was 1.42. Overall, these studies suggest that both male and female swamp sparrows discriminate between different versions of the same syllable type with moderate effect sizes, and that discrimination against high-dc songs is approximately similar in magnitude to discrimination against low-performance songs.
Both male and female swamp sparrows responded more strongly to more typical than to less typical versions of the same syllable type. Males responded more aggressively in the context of playback of song to territory holders in the field, while females performed more sexual displays in a laboratory test of mating response. Typicality within the syllable type category therefore appears to be a relevant cue to both classes of potential receivers of this dual function signal.
Our results suggest that, like human speech categories, swamp sparrow syllable types have internal structure with exemplars near the centre of the category eliciting a stronger response than those near the category boundary. How this is achieved is less clear. Swamp sparrows might actually infer and learn prototypes through psychological equivalents of the statistical processes we used to estimate dc scores. But other psychological processes could produce the same result. For example, receivers might memorize many individual exemplars of a syllable type, and then assess a song based on its integrated similarity to all of the memorized exemplars .
Even if receivers simply memorized one, randomly selected exemplar of each syllable type, they might, on average, be expected to discriminate low-dc from high-dc stimuli. This is because the probability density of syllables in acoustic space is highest around the statistical centre of syllable type distributions (figure 2), so a randomly selected exemplar is more likely to be more similar to a low-dc than a high-dc stimulus. We can examine how reliable this mechanism might be using our quantitative analysis of syllable similarity (see the electronic supplementary material, S2 for details). This analysis suggests that 17.2% of randomly selected exemplars would in fact be more similar to a high-dc stimulus than to its low-dc counterpart. Thus, while, on average, receivers might discriminate between low-dc and high-dc exemplars using this process (17.2% is lower than 50%, which would indicate an inability to discriminate), this discrimination would not be very reliable. It seems unlikely that we would have detected significant discrimination between low-dc and high-dc stimuli in our experiments if receivers were simply memorizing one exemplar per type, especially considering that our methods to detect preferences already involve considerable statistical error.
Distinguishing between prototype and exemplar hypotheses has not proved straightforward in studies of human speech (e.g. [8,9]). In the case of swamp sparrows, experimental manipulation of the statistical distribution of songs heard during development, and operant conditioning tests of discrimination abilities for typical and less typical songs would certainly help clarify the processes underlying our results.
While preferences for typical versions of syllable types could reflect a non-selected preference for familiarity, our results provoke the question of what role such preferences might play in communication. One possibility is that typicality provides an honest signal of signaller quality [31,32] in the contexts of female mate choice  and territorial competition. The developmental stress hypothesis  suggests that song learning accuracy can be a reliable indicator of male phenotypic quality because early stress affects brain development, which in turn reduces song learning accuracy in swamp sparrows , as well as influencing other aspects of phenotype, such as adult body size . Learning accuracy may also indirectly reflect a male's genotypic quality by revealing how well he resisted or coped with stress during development .
For either female or male receivers to use song learning accuracy as an assessment signal, they must first be able to reliably assess song learning accuracy. To do that, they must have some point of reference, but they are very unlikely to have direct knowledge of the identity of a male's tutor and thus the specific models he attempted to learn. Attempting to infer whom a male learned from, based on song similarity, is fraught with potential errors and can make assessment very unreliable . If, however, birds develop a concept of syllable prototypes (or an exemplar-based equivalent) by learning from multiple versions of the same type, these learned prototypes will be similar between different signallers and receivers, even if they do not share the same set of tutors, allowing reliable assessment . Assessing prototypicality can, in theory, serve as a proxy for assessing learning accuracy directly.
To female receivers, typicality may indicate a male's parental abilities, which would benefit her directly by helping her raise offspring successfully, or may indicate aspects of his genotype that would benefit her indirectly by providing good genes to her offspring . Less is known about assessment of song features in the context of male–male aggression, but male songbirds, including swamp sparrows, have been shown to respond more to differences in vocal performance, with a more aggressive response given to higher performance songs [15,37]. Increased stress experienced early in life leads to reduced body size in song sparrows , which itself leads to reduced fighting ability in other songbird species [38,39]. Therefore, males might be predicted to pay attention to song features that reflect developmental stress, such as typicality. That male swamp sparrows respond more aggressively to typical songs is in line with this idea.
Our finding that receivers discriminate against less typical versions of syllable types complements earlier work demonstrating discrimination against songs from foreign populations (reviewed in ). Female swamp sparrows from the Conneaut population, for example, discriminate against songs from a New York population about 500 km distant . One reason why receivers might discriminate against foreign songs could be because such songs do not conform closely to song types from the local population [36,40]. By this scenario, preference for local songs is an extension of the preference for highly typical versions of local types: because they have a low degree of typicality compared with local types, foreign songs are interpreted by receivers as being poorly learned. An alternative hypothesis, however, is that discrimination against foreign songs allows females to select males with locally adapted genes , and this provides a second adaptive explanation for preferences for more typical versions of syllable types. In this scenario, the high-dc stimuli in our study might have been perceived as foreign songs.
A closer examination of our stimuli suggests this second explanation is unlikely. The high-dc stimuli in our study were not extremes of syllable type categories: 23.9% of the syllable types in our sample either had a dc score greater than the average dc score of our high-dc stimuli, or were sung by five or fewer males and might reasonably be expected to be treated as atypical by receivers (see the electronic supplementary material, S2). This means that female receivers discriminate against a large proportion of songs within their population. Based on the patchy distribution of swamp sparrows in northwestern Pennsylvania where our study site is located, it seems unlikely that most of these songs were sung by immigrants. And if a large proportion of these songs were in fact sung by immigrants, then the migration rates between populations would surely be high enough to remove any local adaptation or cultural differentiation. If instead most are poorly learned songs copied from local males, then females discriminating against these atypical songs would not be choosing between foreign and local males in any case.
Studies of the responses of swamp sparrows to songs varying in performance and typicality demonstrate that receivers attend to very subtle features in song structure. In the case of song performance, it appears that in swamp sparrows and other species, these preferences generate selection for songs that approach a performance limit set by morphological constraints . For song learning accuracy, there may be selection not only for the high level of learning precision observed in swamp sparrows  and other species, but also for the categorization and categorical perception of syllables and notes . Research into categorization in speech has clearly shown that humans distinguish between more or less typical phonemes [4–8]. If similar psychological processes underlie preferences in swamp sparrows, this species might provide a model for understanding categorization in learned communication systems.
This research followed US laws and was approved by the Institutional Animal Care and Use Committees of Duke University and the University of Pittsburgh.
Funding was provided by Duke University and an Arthur and Barbara Papp award. This is a contribution of the Pymatuning Laboratory of Ecology.
- Received January 30, 2014.
- Accepted April 4, 2014.
- © 2014 The Author(s) Published by the Royal Society. All rights reserved.