Social vocalizations can release oxytocin in humans

Leslie J. Seltzer, Toni E. Ziegler, Seth D. Pollak


Vocalizations are important components of social behaviour in many vertebrate species, including our own. Less well-understood are the hormonal mechanisms involved in response to vocal cues, and how these systems may influence the course of behavioural evolution. The neurohormone oxytocin (OT) partly governs a number of biological and social processes critical to fitness, such as attachment between mothers and their young, and suppression of the stress response after contact with trusted conspecfics. Rodent studies suggest that OT's release is contingent upon direct tactile contact with such individuals, but we hypothesized that vocalizations might be capable of producing the same effect. To test our hypothesis, we chose human mother–daughter dyads and applied a social stressor to the children, following which we randomly assigned participants into complete contact, speech-only or no-contact conditions. Children receiving a full complement of comfort including physical, vocal and non-verbal contact showed the highest levels of OT and the swiftest return to baseline of a biological marker of stress (salivary cortisol), but a strikingly similar hormonal profile emerged in children comforted solely by their mother's voice. Our results suggest that vocalizations may be as important as touch to the neuroendocrine regulation of social bonding in our species.

1. Introduction

The strength and quality of relationships between individuals are critical to fitness in many animals. While the behaviours that may facilitate the formation and maintenance of these relationships are readily observable, such as grooming in primates or social allofeeding in birds, their biochemical underpinnings are less evident. Since natural selection operates upon inter-individual variation in behavioural phenotype, an understanding of the proximate mechanisms responsible for eliciting or perpetuating social behaviour is critical to the study of evolution. One of the ways in which this can be examined is through an analysis of the hormones involved in behavioural regulation.

(a) Oxytocin, stress and social support

The neuropeptide oxytocin (OT) plays complex roles in the central nervous system in establishing maternal/infant and other types of attachments in a species- and sex-specific manner (Carter 1998), particularly the development of trust, pair bonding and recognition of familiar individuals in rodents such as prairie voles (Insel 1997; Lim & Young 2006; Bales et al. 2007; Grippo et al. 2009). This makes OT a candidate for the study of the neurological bases of human social behaviour.

There is considerable debate as to how OT operates to promote relationships and/or regulate stress in the face of differential social contact in endogenous systems. One such model holds that physical touch is critical to social bonding as mediated by OT, as is the familiarity of the individual providing contact, especially in the case of touch between mothers and their offspring (Uvnäs-Moberg 1996, 1997). It appears that this type of tactile contact in the context of complex social interactions can impact fitness as well; in vervets, for example, individuals recently groomed by a conspecific respond more quickly to an alarm call from that individual than they do to others (Seyfarth & Cheney 1984), and touch between chimpanzees appears to reduce tension and bolster relationships between individuals who have recently engaged in agonistic interactions (deWaal 2000; Arnold & Whiten 2001). Individuals with strong social networks as evidenced by extensive grooming, particularly with their own kin, also appear to have more offspring than those with weaker ties to others (Silk 2007). While a relationship between OT and these non-human primate behaviours has not yet been established, such behaviours are associated with increases in OT in other mammals such as rats (Uvnäs-Moberg 1998), human adults (Grewen et al. 2005) and children (Wismer Fries et al. 2005).

This construct, however, may belie the complexity of how the OT system works. For example, OT also seems to be related to amelioration of social stress, either directly or through intermediate hormonal factors; indeed, touch in rodent species tends to take place after stressful interactions, possibly because OT is released in response to activation of the hypothalamic–pituitary–adrenal axis (Jezová et al. 1996; Campbell 2008). This in turn may facilitate comfort-seeking and affiliative bonding, particularly in females who may opt to ‘tend and befriend’ during stressful times rather than engage in fight or flight responses in order to protect their offspring and themselves (Taylor et al. 2000). It remains to be seen whether or not OT is released by stress itself in addition to tactile contact—or potentially by other types of social interactions.

(b) Vocalizations and touch—neuroendocrine similarities?

Although it has been conjectured that human vocalizations in the form of female speech can release OT (Brizendine 2006), this is yet to be demonstrated. Like touch, however, vocalizations are used by a number of species to communicate aggression, proceptivity, anxiety and a number of other emotional states. Little is known of the role of OT in the production of vocalizations, although new studies reveal that they are likely to be important. In the soniferous fishes, a counterpart to OT (isotocin) regulates the expression of vocal communication of the sexes differently, via strongly steroid-mediated developmental differences in the brain (Goodson et al. 2003). The neuroanatomy of oxytocinergic neurons have also been explored in the moustached bat (Pteronotus parnellii), showing numerous terminations in and around auditory brain regions (Kanwal & Rao 2002). In ‘singing’ mice (Scotinomys xerampelinus), OT receptors are distributed densely in regions of the brain that govern social memory, such as the hippocampus and amygdala (Campbell et al. 2009). Finally, infant OT knockout mice show both profound social deficits (Kramer et al. 2003) and produce fewer social vocalizations than their peers (Takayanagi et al. 2005), while also demonstrating both increased frequency of ultrasonic stress vocalizations and higher levels of circulating cortisol (Liu et al. 2008).

With this in mind, we hypothesized that endogenous OT might change in response to vocal stimuli after a stressful event, even in the absence of any other type of social contact. Partly because humans are the only vertebrate capable of producing a continuous and precisely timed amount of comforting vocalizations upon request, we selected mother–daughter pairs as our test subjects and investigated whether exposure to socially supportive speech (defined as the combination of prosodic and linguistic vocal cues) could produce the same physiological effects as direct physical contact as assayed via peripheral measures of OT.

The validity of peripheral measurement of OT as it pertains to such behavioural variables is a contentious issue requiring further empirical work to resolve. In particular, some studies suggest that the central and peripheral OT systems operate independently of one another to the extent that serum, salivary or urinary OT measurement cannot reflect central processes in the way that invasive methods such as cerebrospinal draws can (Amico et al. 1990; Engelmann et al. 1999; Landgraf & Neumann 2004; Neumann 2007). Other studies, however, have begun to reveal that a relationship may exist between peripheral measures of OT and centrally mediated behaviour, given changes in peripheral OT with exposure to different behavioural paradigms in a number of species, including humans (Carmichael et al. 1987; Modahl et al. 1998; Cushing & Carter 2000; Cushing et al. 2001; Wismer Fries et al. 2005; Seltzer & Ziegler 2007). Such methods allow for non-invasive, real-time analysis of the ways in which different types of contact may influence neuroendocrine function, potentially making them more desirable options for use with human subjects.

2. Material and methods

(a) Subjects

Our subjects consisted of 61 female children, aged 7–12 years (M = 9.4 years, s.d. = 1.61 years) and their mothers. Only pre-menarchal children were included to minimize oestrogen-mediated changes and potential serum contamination of urine associated with menstrual cycling. All children had reached adrenarche, resulting in relative control of inter-individual variation in cortisol levels. We selected post-adrenarche, prepubescent females as test subjects because: (i) we wished to test whether or not OT was released after a stressful event in females, with and without tactile and/or verbal contact from a trusted parent; (ii) OT-mediated phenomena are best studied in females, anchoring our work in the existing literature; (iii) we wished to examine whether or not the role of OT in decreasing stress, if any, was present before oestrogen cycling commenced with puberty; (iv) we posited that female children would be more accepting of warm physical touch and verbal contact with their mothers than male children owing to cultural norms; and finally (v) we posited that younger children might not be able to understand instructions well enough to complete our stress paradigm successfully. All children with a documented history of abuse, neglect or prior institutionalization were eliminated for purposes of this study, but are the subjects of future work.

(b) Experimental protocols

To control for potential circadian fluctuations in hormone concentrations, each experimental session began at 16.00 h. After obtaining consent and assent, children underwent the Trier Social Stress Test for Children, a procedure which involves completing a series of timed public speaking and math performance tasks aloud in front of an audience (Kirschbaum et al. 1993) and that has been specially modified for use with children (Gunnar et al. 2009). After experiencing this stressor, children were randomly assigned to one of the three experimental conditions. In the direct contact condition (n = 19) children were reunited with their mothers, who comforted their child with all sensory stimuli including physical contact. A second group of children (n = 20) received a telephone call from their mothers from a different location. By virtue of physical distance, contact was limited to speech. Moreover, children in this condition were not allowed to make visual contact with their mothers throughout the course of the experiment and were provided with a phone in order to control for the possible influence of non-auditory social cues. A third group of children (n = 22) participated in a control condition in which they watched a neutral film for 75 min, but did not have any contact with their mother of any type until the completion of the experiment. Children in the first two conditions had 15 min of contact with their mothers (either total contact or verbal), after which they watched the same film as the children in the control group for 60 min. To index children's stress responses, salivary cortisol was collected at arrival, baseline (30 min after the novelty of arriving at the laboratory but prior to the stressor), immediately after the stressor but before experimental contact, and then at 15, 30, 45 and 60 min post-maternal contact, if any. OT was assayed from urine samples collected at the following time points matched with salivary cortisol: arrival, baseline, post-stressor 30 min and post-stressor 60 min. This study was approved by the Human Subjects Committee/Institutional Review Board at the University of Wisconsin-Madison.

(c) Hormonal assays

(i) Saliva

Saliva samples were frozen on dry ice after collection and assayed using a microtitre plate coated with monoclonal cortisol antibodies. Twenty-five microlitres of each sample was tested duplicate. Bound cortisol peroxidase was measured by the reaction of the peroxidase enzyme on the substrate tetramethylbenzidine. Optical density was read using Assay Zap data reduction software at 450 nm using a four-parameter sigmoid minus curve fit. Standard concentrations range from 3.00 to 0.012 µg/dl, with intra- and inter-assay coefficients at 3.35 and 3.75, respectively.

(ii) Urine

Urine samples were snap-frozen on dry ice and subsequently stored at −80°C. After controlled thawing, urinary samples were subjected to solid-phase extraction using 1 ml SepPak C18 cartridges (Waters, cat no. WAT023590). Each column was pretreated with 1 ml of methanol and then 1 ml of water before application of 1 ml of urine. A 10 per cent acetonitrile (ACN) plus 1 per cent trifluroacetic acid (TFA) wash (1 ml) was then applied, after which the elutant was collected in a clean tube via application of a final 1 ml application of 80/20 per cent ACN solution with 1 per cent TFA to the column. Samples were then dried down in a water bath with air stream and reconstituted in the assay-appropriate buffer supplied in the 96-well enzyme linked immunosorbent assay kit used (Assay Designs. Cat no. 901-153; cat no. 901-017 for AVP). Intra- and intercoefficients of variation were determined by a human urine pool, and oxytocin standards were used to determine recoveries from the extraction method (intra-assay/inter-assay coefficient of variation = 24.2/10.5, recoveries 92.1 ± 5.23%, n = 8). Each plate was read on a Molecular Devices Spectramax 340PC 384 at 405 nm. Data were analysed by weighted least-squares regression analysis and reduced by log-logit transformation to yield peptide concentrations. Creatinine was also collected with each urine sample and analysed to correct for variation in water content (simple creatinine value divided by peptide concentration) to arrive at pg OT per milligram creatinine (Ziegler et al. 1995). Analysis of variance (ANOVA) was used to compare the experimental groups with one another at each time point, with post hoc analysis performed via Tukey's HSD test.

3. Results

(a) Cortisol

Children in all three conditions exhibited an increase in salivary cortisol from baseline to peak (t68 = −3.4, p < 0.01), indicating that the social stressor was effective. However, treatment conditions differed following the stressor (F2,52 = 3.99, p < 0.02). Both direct, interpersonal contact and vocal contact alone were effective at reducing children's cortisol levels after 1 h, although the condition involving touch resulted in a more rapid decrease and lower levels of peak cortisol. Across experimental time, reduction in cortisol in children engaging in speech only was intermediate between the other two groups, but by the end of the study children hearing their mother's voice and those interacting with their mothers directly were statistically indistinguishable (comparable to baseline levels across participants). Children receiving no social contact exhibited higher levels of cortisol than the other two groups, even an hour after the stressor was complete (F2,52 = 4.475, p < 0.02; figure 1).

Figure 1.

Salivary cortisol levels across experimental time, by maternal contact type. The green arrow represents the onset of the stressor, and the pink arrow the onset of maternal contact (if any). Both speech and direct contact facilitate a more rapid return to baseline values than simply resting alone. Diamonds and black line, no contact; triangles and red line, direct contact; squares and blue line, speech-only.

(b) Oxytocin

As predicted by extant studies, urinary OT was released in children following normative comforting by their mothers involving direct physical contact. We also observed, however, that girls released conspicuously similar levels of OT in response to speech with their mothers, even in the absence of all other types of somatosensory contact. Both physical and speech-only contact affected children's OT levels within 15 min post-stressor and this effect was maintained as long as 1 h post-stressor (F2,42 =4.54, p = 0.02 and F2,42 =3.73, p = 0.02, respectively). By contrast, OT levels did not change overall for children who rested alone and received no form of maternal comforting following the stressor (figure 2).

Figure 2.

Maternal speech releases oxytocin in girls, in much the same way as direct interpersonal interaction including comforting touch. Diamonds and black line, no contact; triangles and red line, direct contact; squares and blue line, speech-only.

4. Conclusion

This work paves the way for understanding the proximate mechanisms responsible for the formation or maintenance of social ties in humans, and also for arriving at a model for how OT operates with respect to stress and social contact. Simply put, our work emphasizes a model in which social contact of at least two types (tactile and vocal) can release OT in female children after a socially stressful event. The implications of these findings are twofold: those that are relevant to the evolution of behaviour and those that are relevant to human society and early development.

First, this work reveals that vocal cues in humans are similar to tactile contact as seen in other mammals with respect to the release of OT. It is, however, important to view these results in the proper evolutionary context. OT is a uniquely mammalian hormone, probably having evolved along with the smooth muscle contractions associated with parturition and lactation approximately 200 million years ago (Chauvet et al. 1985; Acher et al. 1995). Since the anatomical apparatus necessary for production of vocal cues is at least 400 million years old (Bass et al. 2008), it may be the case that maternal–offspring touch as a facilitator of OT release in mammals represents an exaptation from these earlier social signalling systems rather than the other way around. Alternatively, this may be simply another example of how this single peptide, which has remained essentially unchanged throughout the course of mammalian evolution, can differently influence behaviour owing to hypervariable OT receptor status within the brains of different species (Donaldson & Young 2008).

Since each vertebrate clade contains famously vocal members, language in the sense of human's unique ability to use recursive grammar may be unlikely to stand alone in its ability to release OT. It is at least as likely that prosodic cues are responsible for the observed similarities in OT release between touch and human speech, and that non-linguistic social vocalizations facilitate attachment via the release of OT or related peptides in many other species. Nonetheless, two grammatically identical instances of human language differ in meaning depending on tonality, who is speaking, who is listening and the nuances of the relationship between them. Focusing on language alone is the focus of a future study, and it is our hope that students of vertebrate vocalizations will choose to focus future work on changes in peripheral OT release in response to vocal cues in other species.

Second, vocal cues may be a viable alternative to physical contact for servicing human relationships. The work of Harry Harlow in the 1970s demonstrates that social isolation, especially early in development, is detrimental to health and behavioural outcomes in primates, and work with children who have been institutionalized rather than reared by their biological parents (or early adoptive ones), abused or neglected show that this work is probably translational. Indeed, humans lacking social support from family and friends have poorer health outcomes than their better-connected peers (Couzin 2009). Vocal cues may be able to provide some of the same relief from these outcomes as direct interpersonal interaction including touch. This alternative means of buffering stress while facilitating social bonds in middle childhood may underscore the relationship between typical development and hormonally mediated affiliative bonding in our species.


This study was approved by the Human Subjects Committee/Institutional Review Board at the University of Wisconsin-Madison.

We would like to thank Bret Larget for his kind assistance, as well as John Hawks, Karen Strier, Chuck Snowdon and two anonymous reviewers for their helpful comments. This research was supported by the US National Institutes of Health (MH61285).


  • Received March 17, 2010.
  • Accepted April 20, 2010.


View Abstract