Rhesus monkeys correctly read the goal-relevant gestures of a human agent

Marc D Hauser, David Glynn, Justin Wood

This article has a correction. Please see:


When humans point, they reveal to others their underlying intent to communicate about some distant goal. A controversy has recently emerged based on a broad set of comparative and phylogenetically relevant data. In particular, whereas chimpanzees (Pan troglodytes) have difficulty in using human-generated communicative gestures and actions such as pointing and placing symbolic markers to find hidden rewards, domesticated dogs (Canis familiaris) and silver foxes (Urocyon cinereoargenteus) readily use such gestures and markers. These comparative data have led to the hypothesis that the capacity to infer communicative intent in dogs and foxes has evolved as a result of human domestication. Though this hypothesis has met with challenges, due in part to studies of non-domesticated, non-primate animals, there remains the fundamental question of why our closest living relatives, the chimpanzees, together with other non-human primates, generally fail to make inferences about a target goal of an agent's communicative intent. Here, we add an important wrinkle to this phylogenetic pattern by showing that free-ranging rhesus monkeys (Macaca mulatta) draw correct inferences about the goals of a human agent, using a suite of communicative gestures to locate previously concealed food. Though domestication and human enculturation may play a significant role in tuning up the capacity to infer intentions from communicative gestures, these factors are not necessary.


1. Introduction

Human language and action provide rich indications of an agent's underlying intentions and goals. Early in development, human infants understand that pointing represents an attempt to intentionally convey information about a relevant object or event. Further, infants readily appreciate that an agent's direction of gaze reveals important details of her knowledge, including her goals, beliefs and desires (Tomasello et al. 2005). Recent work on domesticated animals, including dogs, goats, horses and foxes, suggests that the process which alters both the brains and the bodies of these animals from their wild-type yields similar abilities with respect to reading the communicative intentions of a human agent (Hare et al. 1998, 2002; Hare & Tomasello 1999; Miklosi et al. 2003; Gacsi et al. 2004; Miklosi 2006). In particular, in an object choice task, in which an experimenter conceals a piece of food in one of two or more hiding locations, several studies now show that domesticated animals can use a human agent's pointing gesture and direction of eye gaze to correctly infer the location of the hidden food. Surprisingly, perhaps, studies of captive chimpanzees (Pan troglodytes Povinelli et al. 1997; Tomasello et al. 1997, 2005; Call et al. 2000), as well as other primates (Anderson et al. 1995, 1996), generally lead to the conclusion that these animals fail to use the communicative gestures and actions of a human experimenter to infer the location of a target goal, even though many of these primates can follow the eye gaze of a conspecific (Emery et al. 1997; Tomasello et al. 1998). In the most recent summary of this work, Tomasello et al. (2005, p. 685) conclude, ‘that apes are not able to understand communicative intentions as manifest in such acts as pointing or placing a marker to indicate the location of food’. This pattern of results has led some to hypothesize that humans uniquely evolved the ability to understand pointing and eye gaze as cues to an agent's goals and intentions (Tomasello et al. 2005), but that human intervention through domestication can selectively alter this ability, perhaps by means of modified temperamental dispositions (Hare et al. 2002, 2005).

At present, there are several wrinkles to this story (Hauser 2006; Miklosi 2006), thereby precluding a firm conclusion concerning the origins and selective pressures that shaped the capacity to infer communicative intention. First, though domesticated animals seem to outperform the wild-types, as well as chimpanzees and most primates, there are a few examples of non-domesticated animals succeeding on some versions of these tasks, albeit with extremely small sample sizes of individuals (Neiworth et al. 2002; Pack & Herman 2004); further, even apes succeed if you pool data across subjects, and on some tasks, some individuals show above chance performance (Call et al. 2000). Second, there is some evidence that enculturated primates (i.e. reared by humans) outperform naturally reared primates (Call & Tomasello 1994; Carpenter et al. 1995; Call et al. 2000); this suggests that something about the human environment may enable this ability, in the absence of domestication. Third, we know of no studies testing animals under free-ranging conditions and with a large sample of individuals; as a result of the small sample sizes used thus far, tests entail repeated trials with the same conditions, thereby precluding quantitative analyses of first-trial effects. Fourth, all studies using gestures use some version of the human pointing gesture, as opposed to the potentially communicative gestures of the target species.

Here, in this study, we attempt to iron out some of the gaps in our current understanding by testing a large sample of free-ranging, non-domesticated, non-enculturated rhesus monkeys (Macaca mulatta) observing a human agent presenting, across different trials and conditions, several communicative gestures, exploring the results from both single and repeated trial conditions. Each type of gesture was designed to determine what limits a rhesus monkey's ability to achieve an accurate reading of an agent's goals.

2. Material and methods

(a) Participants and coding

We tested free-ranging rhesus monkeys living on the island of Cayo Santiago, Puerto Rico (Rawlins & Kessler 1987). In each of the single-trial-per-subject conditions, we successfully tested 40 adult male and female rhesus monkeys. In previous two-option forced-choice experiments with this population (Hauser et al. 2000; Santos et al. 2001), as well as in the current studies, trials were aborted owing to (i) failure to approach one of the boxes, (ii) interference from another monkey, or (iii) failure to attend to the entire presentation. In this particular experiment, across all the single-trial-per-subject conditions, approximately 40% of trials were aborted, and the majority of these entailed interference from another subject or failure to attend to the presentation. Although only one experimenter (D.G.) ran these experiments, each trial was videotaped by placing a camera on a tripod and initiating recording prior to the presentation. These videotapes were used to assess whether there were biases in our abort criteria. Specifically, D.G. randomly selected 20 trials coded as ‘aborted’ and 20 trials as ‘successful’; the successful trials were cut prior to the subject's approach to the box in order to make them more comparable to the aborted trials. Thus, all blind-coded trials started with the experimenter's presentation and ended some time before the subject moved towards a box. J.W. coded these clips based on the criterion above, but blind to D.G.'s labelling. There was 100% agreement between D.G. and J.W. for all 40 trials. These analyses show that D.G., while running the experiments, was not biased in his decision to proceed or abort.

(b) Procedure

An experimenter set out to find lone individuals who were not engaged in distracting activities such as eating, grooming or conflict with another monkey. Having located a subject, the experimenter first set up the camera and tripod, turned the camera on and then placed two wooden boxes (30 cm×30 cm×30 cm) side by side, approximately 3 m from the subject. The experimenter then blocked the subject's view of the boxes with a foamcore occluder (50 cm×30 cm). In the trials in which food was visibly presented, the experimenter revealed an apple slice from behind the occluder, paused for 1 s holding the slice above the occluder and then slowly lowered the slice back behind the occluder, equidistant from the two boxes. He then surreptitiously placed the slice in a cloth pouch on the back of the occluder, unbeknownst to the subject. From the monkey's perspective, it appeared as though the apple was placed into one of the two boxes but, critically, it was impossible to tell which one. To determine whether rhesus require visible evidence of a potential goal (i.e. the food), or whether the communicative gesture is sufficient to motivate goal-directed behaviour, we ran several conditions with only a communicative gesture, but food absent. Thus, if rhesus selectively approach the container targeted by the experimenter's action or gesture, then we would be licensed to conclude that they can infer the existence of a goal based on the information about the action. In food-absent trials, the experimenter held the occluder up for 3 s, the same amount of time it took to present the apple slice in the conditions presenting food.

In both the food-present and the food-absent conditions, the experimenter then removed the occluder and spread the boxes 2 m apart. The subject then watched as the experimenter performed one of the several different gestures targeting one of the two boxes. After performing the gesture, the experimenter walked away, allowing the subject to approach and choose one of the boxes. We defined a choice as the first box approached and touched. For each gesture type, we counterbalanced the targeted side (left versus right).

(c) Description of gesture types

We started these experiments based on an ethological description by M. H. of a gesture commonly used among rhesus monkeys, as well as other Old World monkeys, engaged in the recruitment of an ally in a fight (figure 1). In particular, when rhesus monkey A attempts to recruit B against C, A looks to B and then rapidly shifts attention to focus on C. This triangulation is functionally like pointing in that A attempts to ensure that B is looking in the same place as A, and the rapid movement of the head and eyes from B to C appears to serve this function. Therefore, our starting point for these studies was to assess whether this species-specific gesture might facilitate the recognition of a human agent's goal.

Figure 1

Schematic of the rhesus monkey's recruitment gesture used to recruit allies in a coalition. Here, animal A attempts to grab B's attention by quickly jutting his head towards B (arrow 1). Once A attracts B's attention, A rapidly juts his head towards C (arrow 2).

(i) Communicative gesture

After separating the boxes, the experimenter established eye contact with the subject and then jutted his head forward with his eyes wide open. Subsequently, the experimenter jerked his head towards the target box, jutted his head three times towards the target box and then maintained visual contact with the box for 3 s before walking away and allowing the subject to approach. This was intended to mimic the recruitment gesture described above.

All of the following conditions were designed to unpack the necessary and sufficient conditions for recruiting a successful goal-directed approach, disentangling specific aspects of the gesture as well as the spatial relationship between the experimenter, the two boxes and the subject, and the necessity of seeing the target goal before it moves out of view. The latter is of particular interest as several studies of chimpanzees suggest that individuals show enhanced capacities to read the mental states of others in competitive contexts over clearly visible food (goal; Hare 2001; Hare & Tomasello 2004).

(ii) Basic gaze

After separating the boxes, the experimenter turned his head and eye gaze towards the target box. The experimenter gazed at the ground directly in front of him before gazing at the target box. The experimenter continued to stare at the box for 3 s before walking away.

(iii) Communicative gesture from opposite box

After separating the boxes, the experimenter took one large step to the side so that he was standing behind one box. Once in place, the experimenter performed the communicative gesture described above towards the second box. Once the gesture was complete, the experimenter walked directly towards the centre of the boxes and away from the subject. Here, the aim was to determine whether subjects attend more to the spatial relationship between experimenter and target as opposed to the communicative gesture that indicates the goal.

(iv) Pointing gesture

After separating the boxes, the experimenter pointed with his index finger towards the target box; rhesus never use a pointing gesture. The experimenter never established eye contact with the monkey; rather, the experimenter gazed at the ground directly in front of him before pointing, and then simultaneously pointed and gazed at the target box. The distance between the experimenter's index finger and the box was approximately 18 in. The experimenter continued to stare at the box for 3 s before walking away.

3. Results

Figure 2 shows the results. In the food-present communicative gesture condition, rhesus were more likely to approach the box targeted by the gesture (30/40 subjects, binomial probability: P=0.001). In the basic gaze condition, rhesus did not selectively approach the targeted box (20/40 subjects, binomial probability: P=0.56); this pattern of approach was significantly different from rhesus' approaching behaviour in the communicative gesture condition (Χ2(1, N=80)=5.33, P=0.02). In the communicative gesture from opposite box condition, rhesus were more likely to approach the box targeted by the gesture (28/40 subjects, binomial probability: P=0.008); there was no difference in the performance when contrasted with the communicative gesture condition (Χ2(1, N=80)=0.25, P=0.62). In the food-present pointing gesture condition, rhesus were more likely to approach the targeted box (31/40 subjects, binomial probability: P=0.0003); there was no difference in performance when contrasted with the communicative gesture condition (Χ2(1, N=80)=0.07, P=0.79).

Figure 2

Results showing the number of subjects that selectively inspected the box targeted by the experimenter's action (black bars) versus the box that the experimenter did not act towards (grey bars). P-values represent binomial probabilities with an α-level set to 0.05 (one-tailed predictions).

In the food-absent communicative gesture condition, rhesus were more likely to approach the box targeted by the gesture (30/40 subjects, binomial probability: P=0.001); this result was identical to the food-present communicative gesture condition. In the food-absent pointing gesture condition, rhesus selectively approached the box targeted by the gesture (31/40 subjects, binomial probability: P=0.0003); this pattern was identical to the food-present pointing gesture condition. Thus, rhesus not only use the communicative gesture and the pointing gesture to locate hidden food, but also use these gestures to infer the existence of potential goals.

Though our initial aim was to avoid retesting the same subjects, this was not possible across conditions. To assess whether prior experience in these experiments might influence subsequent performance, we reanalysed each of the conditions to assess whether experimentally naive individuals performed differently from experimentally experienced subjects. For the food-present communicative gesture condition, experimentally naive individuals selectively approached the targeted box (21/28 subjects, binomial probability: P=0.006), and there was no difference in performance when contrasted with experimentally experienced subjects (Χ2(1, N=68)=0, P=1). In the food-absent communicative gesture condition, experimentally naive subjects selectively approached the targeted box (24/34 subjects, binomial probability: P=0.01) and there was no difference in performance when contrasted with experimentally experienced subjects (Χ2(1, N=74)=0.181, P=0.67). In the food-present pointing gesture condition, experimentally naive subjects selectively approached the targeted box (22/28 subjects, binomial probability: P=0.002) and there was no difference in performance when contrasted with experimentally experienced subjects (Χ2(1, N=68)=0.01, P=0.92). In the food-absent pointing gesture condition, experimentally naive subjects selectively approached the targeted box (22/31 subjects, binomial probability: P=0.01) and there was no difference in performance when contrasted with experimentally experienced subjects (Χ2(1, N=71)=0.39, P=0.53). Thus, for all conditions in which we obtained success with all subjects included, experimentally naive subjects succeeded as well; thus, prior experience on this kind of task is not necessary, nor does the experience appear to improve performance. No subject was tested twice in the basic gaze and communicative gesture from opposite box conditions.

In most of the previous work focusing on the ability of animals to correctly interpret a human gesture as indicating a target goal, individuals were presented with multiple trials within a session, and often, multiple sessions. Therefore, at some level, it is difficult to compare the present results with prior work as we used only a single trial per individual, per condition. To more closely approximate prior work, we attempted to test a small number of individuals with 10 consecutive trials of the communicative gesture using food. These tests are difficult to run, as subjects often move off, are distracted by others approaching, and so forth; thus, we counted as valid only the subjects that were tested while alone, remained in the same general area from start to end, and ran with inter-trial intervals of no more than a few minutes, which included set-up time. We used the same abort criteria as in our single-trial-per-subject conditions. The final dataset included 10 subjects that ran 10 consecutive trials. We tested 17 subjects that ran fewer than 10 trials and 13 subjects that failed to run through one trial. We did not use these 30 subjects in the final dataset; we note here that the abort rate is higher than in the one-shot experiments due to the difficulty of repeated testing of the same subject.

We counterbalanced for side of food placement across the 10 consecutive trials. Following each trial, the experimenter approached the boxes, picked them up, moved away from the subject and set up again. We entered each subject's proportion of correct choices into a one-sample t-test, with the test value set to chance (0.5). This analysis yielded a statistically significant effect (t(1, 9)=23.84, p<0.001). A binomial test comparing the number of subjects that went to the gestured container compared with the non-gestured container on a greater proportion of the trials (10/10 subjects) yielded a highly significant effect (p<0.001). Breaking this down further, two subjects picked the correct box 6 out of 10 times, four picked the correct box 7 out of 10 times, three picked the correct box 8 out of 10 times and one picked the correct box 9 out of 10 times. As shown in figure 3, there was no overall evidence of learning across trials. Specifically, by pooling across subjects and looking at the proportion of correct choices by trial, there was no evidence of improvement from the first to the last test trial.

Figure 3

Proportion of subjects (n=10) picking the gestured (correct) box across trials.

4. Discussion

The methodological starting point for our experiments was the observation that rhesus monkeys naturally deploy a communicative gesture in the context of coalitional behaviour that, from the perspective of a human observer, appears both highly intentional and designed to share attention with a target other. In particular, when a rhesus monkey attempts to engage another in a coalition, he first rapidly and distinctively juts his head towards the target partner and then, if the partner is looking, rapidly shifts his attention towards the targeted opponent. This shift in attention is repeated until the coalition is formed and the attack on the third party underway. Our intuition was that since this appears to involve communicative intent and shared attention, it might function to indicate a target goal. Further, we supposed that if a human experimenter could imitate some of the key surface features of this species-specific gesture, then rhesus monkey subjects might use it to infer the location of a hidden goal.

In the first experimental condition, a human experimenter presented a piece of food, concealed it behind an occluder covering two boxes and then used this communicative gesture to indicate the location of the hidden food. Consistently, subjects approached the indicated box. In all but one of the suite of follow-up conditions, designed to break down this gesture into a set of necessary and sufficient cues, we found evidence that rhesus can use the communicative gestures of a human agent to find both an explicitly presented goal as well as an inferred goal. In particular, we found that rhesus could follow a pointing gesture to the target location, and that the species-specific communicative gesture provided sufficient information even when the experimenter stood in front of the non-target box but indicated the alternative; thus, even though the experimenter was spatially biased towards one box, thus providing potentially salient associational cues, rhesus used the experimenter's communicative gesture to find the target location. Perhaps, most surprisingly, rhesus monkeys appeared to infer the presence of a goal from these gestures. That is, in the absence of presenting a piece of food prior to the gesture, rhesus nonetheless showed a selective approach to the indicated box. This shows that rhesus use a communicative gesture to infer the existence of an object or goal in the absence of any explicit reference to the object or goal, paralleling prior studies of chimpanzees and human-raised ravens (Corvus corax) following eye gaze around a barrier (Tomasello et al. 1999; Bugnyar et al. 2004). The only condition that was insufficient to trigger approach to the target box was a simple orienting response or basic gaze. These results raise several significant theoretical and methodological issues, which we turn to next.

Why might this population of free-ranging rhesus monkeys succeed in situations involving the communicative gestures of a human agent in an object choice task where chimpanzees and other primates (including captive rhesus) generally fail? One possibility is methodological. As mentioned in §1, previous work on this problem has focused on relatively small numbers of captive individuals, using a repeated trial design in which the same subject is presented with multiple opportunities, both within and across sessions, to use a communicative gesture or physical cue to find a target goal. Although some of these studies have explored first-trial effects, this is often difficult and has typically led to ambiguous patterns of response. The advantage, of course, of looking at repeated trials is that it provides the animal with the opportunity to learn from such experience. Even in these cases, however, chimpanzees generally fail to pick up on the pattern of associations between the experimenter's actions and the location of the target goal (Call et al. 2000). At some level, this fairly consistent failure to uncover the target goal in the face of powerful associative cues (e.g. noises, lifting up or pointing at the target box) suggests a higher-order problem; that is, given the ease with which even the simplest organism can learn from association, it is surprising, but telling, when an animal fails to use this mechanism. For the rhesus monkeys observed in these experiments, we tested most subjects on only a single trial. Success was determined by the number of subjects selecting the target box. Therefore, for these subjects prior experience played no role in the pattern of successes. In fact, even for subjects tested on a previous condition, we found no evidence that such experience played a significant role in their capacity to use the experimenter's gesture as a relevant cue, and in our repeated testing condition, found no learning effects as well.

Another possible interpretation of the present data is that in contrast to the chimpanzees tested thus far, rhesus on Cayo Santiago are more like domesticated dogs (Canis familiaris) and silver foxes (Urocyon cinereoargentus): they are more attentive to the actions of a human experimenter. For over 15 years now, a number of researchers have been conducting experiments with rhesus on Cayo Santiago, and some of these experiments involve overlapping design features with the present studies. In particular, in a wide number of experiments (Hauser et al. 2000; Santos et al. 2001; Flombaum & Santos 2005), an experimenter presents one or two boxes or a stage, reveals some amount of food, lowers the food behind an occluder and then either allows the subject to search or reveals the outcome and films looking time. Although these researchers have certainly not tested every single animal on the island, they have exposed many of these individuals, either directly or indirectly, to such tasks. As a result, rhesus monkeys may show heightened attention to human action owing to prior experience in tasks that often provide access to novel food. Although this is certainly an accurate description of rhesus monkeys on Cayo, we do not believe that it provides a compelling explanation of the pattern of results because most, if not all, of the chimpanzees tested in the communicative gesture tasks have also been exposed to experiments involving food retrieval of some sort, and often with a human experimenter (Carpenter et al. 1995; Povinelli & Eddy 1996; Povinelli 2000; Call et al. 2004; Hare et al. in press).

A final possible explanation is that there are fundamental species differences which enable rhesus to succeed where chimpanzees fail. Although this is possible, the current literature on comparative anatomy and behaviour provides few insights. Anatomically, there are certain differences in brain volume and suggestive evidence of novel cell types in the apes (Allman et al. 2002), but all of these differences would point to an advantage for chimpanzees, not rhesus. Similarly, though early work on the relationship between the perceptual experience of seeing and the mental state of knowing pointed to a possible advantage for chimpanzees over monkeys such as capuchins (Cebus apella Hare et al. 2000, 2001, 2003), more recent work has entered rhesus into the picture (Flombaum & Santos 2005; Santos et al. 2006), with comparable abilities emerging. Therefore, at this point, though we have ruled out that either domestication, training or membership in our own species are necessary preconditions for reading the communicative gestures of a human experimenter vis-à-vis explicit or inferred goals, it is not at all clear why this population of rhesus monkeys succeeds under conditions that chimpanzees and other captive primates generally fail.

The final point we would like to make concerns the capacity to draw inferences about goals in the absence of an explicit presentation of the goal. In experimental work by Gergely and his colleagues (Gergely & Csibra 2003), there is evidence that human infants assume a teleological stance when perceiving an action or event using information about an agent's means of responding to particular situations and, especially, its capacity to respond flexibly with respect to environmental constraints. An agent that takes into account such environmental conditions, and responds both flexibly and adaptively, is perceived as rational and goal directed. The evidence presented here is consistent with the teleological stance. Rhesus use the gestures of a human agent to infer the presence of a target goal, even when the details of that goal are ambiguous; that is, rhesus selectively approached the gestured box both when food was initially presented and then concealed and when no object was presented. This is important, given that some of the recent work on chimpanzee social cognition has argued that their capacity to read intentions as well as other mental states may privilege the competitive over the cooperative context (Hare 2001; Hare & Tomasello 2004). Here, the proposal is that if rhesus can properly read the goal-directed gestures of a human agent when no food or other desirable object is directly presented, then their capacity may be mediated by a more general ability to infer communicative intent.


All of the research reported here was approved by the Committee for the Care and Use of Animals both at the University of Puerto Rico and at Harvard University.

For making the work on Cayo Santiago possible, we would like to extend a special thanks to Dr Melissa Gerald and the entire CPRC staff. This publication was made possible by grant number CM-5-P40RR003640-13 from the National Center for Research Resources (NCRR), a component of the National Institutes of Health (NIH). Its contents are solely the responsibility of the authors and do not necessarily represent the official views of NCRR or NIH. Additional funding was provided by an NIH pre-doc to J.W. and by a McDonnell Grant to M.D.H. For help in running the repeated trial condition, we thank Ryan Boyko. For discussion of the data and comments on the manuscript, we thank Brian Hare, Laurie Santos, Mike Tomasello and Felix Warneken.


    • Received March 27, 2007.
    • Accepted May 3, 2007.


View Abstract