In many instances of cooperation, only one individual has both the potential and the incentive to ‘cheat’ and exploit its partner. Under these asymmetric conditions, a simple model predicts that variation in the temptation to cheat and in the potential victim's capacity for partner control leads to shifts between exploitation and cooperation. Here, we show that the threat of early termination of an interaction was sufficient to induce cleaner wrasse Labroides dimidiatus to feed selectively against their preference (which corresponds to cooperatively eating client fish ectoparasites), provided that their preference for alternative food was weak. Under opposite conditions, cleaners fed selectively according to their own preference (which corresponds to cheating by eating client mucus). By contrast, a non-cleaning fish species, Halichoeres melanurus, failed to adjust its foraging behaviour under these same conditions. Thus, cleaners appear to have evolved the power to strategically adjust their levels of cooperation according to the circumstances.
In recent years, biologists have become increasingly aware that there are many possible mechanisms of partner control, all of which can hypothetically suppress cheating, and hence maintain cooperation between unrelated individuals [1–9]. The empirical evidence suggests that many forms of cooperation, especially interspecific mutualisms, feature a marked asymmetry between interacting individuals in the scope for cheating or for partner choice [1,3,10]. As a consequence, attention has shifted from reciprocity (as explored in iterated Prisoner's Dilemma games) to alternative control mechanisms, such as punishment [11,12], sanctions , partner switching [14,15] or indirect reciprocity based on image scoring [16,17]. Several models of cooperation focus explicitly on such forms of partner control in asymmetric contexts [14,16,18–22].
Empirical evidence for the importance of these alternative control mechanisms is also accumulating and has even driven the development of some of the theoretical papers cited earlier. However, while the collection of quantitative experimental data on cooperative behaviour has traditionally involved human subjects (see [23–26] among many other examples), experiments on non-human cooperation have typically been only qualitative in nature [20,27–30]. Exceptions are provided by Kiers et al. , who quantified the exchange of nutrients between plants and rhizobia, and by Raihani et al. , who tested how male cleaners adjust levels of punishment according to the magnitude of the damage inflicted on them by cheating female partners.
A more quantitative approach is needed for at least two reasons. First, to properly understand the evolution and the maintenance of cooperation we need to determine the extent to which animals adjust levels of cooperation to variable conditions. Variable conditions may affect cooperation in many ways, as through the effects of internal states on behaviour [22,33] or through the introduction of errors owing to uncertainty: the ‘trembling hand’  is a key variable promoting cooperation, and variable conditions may provide a biological basis for errors. Furthermore, variable conditions may select for increased cognitive abilities necessary to fine-tune behaviour , but also for the development of simple rules of thumb that do overall well while largely ignoring the complexity of the environment . In both cases, decision rules of individuals have to be known to determine under which conditions cooperation is stable. A second reason to use a more quantitative approach is to identify variables that have a major effect on the potential for conflict. As it stands, it is notoriously difficult to measure the exact fitness consequences of cheating/being cheated, punishment/being punished, etc., in animals. Few laboratory studies have developed experimental designs that measure precise payoffs  while the vast majority of studies may only offer informed guesses about relative payoffs. Variable behaviour according to conditions may allow us to identify situations in which conflict is minimal and cooperation is maximal, and to identify conditions under which the effectiveness of partner control mechanisms such as punishment may break down, leading to the end of interactions or to parasitism .
Here, we provide a quantitative study on levels of cooperation, with the intention of testing a simple model of partner control that potentially explains how the outcome of an interaction can shift from cooperation to exploitation (i.e. from mutualism to parasitism) and vice versa . The model assumes that an interaction between two players lasts for a variable duration that depends on the decisions of the two players. Only one player has the option to cheat, while the partner lacks any such option. Instead, the potential victim has the capacity to terminate or at least shorten the duration of their interaction. The model predicts cooperative outcomes as long as (i) the payoff per time unit for cheating is sufficiently low when compared with the payoff per time unit for cooperating (i.e. the temptation to cheat is low enough), and (ii) the potential victim has sufficient control over the duration of the interaction (i.e. the threat of termination is strong enough). As many real life interactions are characterized by asymmetries in strategic options [3,10] and an extended duration of action , the model is potentially of broad relevance.
The cleaning mutualism involving the cleaner wrasse Labroides dimidiatus provides a model system to study cooperation in asymmetric games such as this: cleaners may cooperate with client fish by consuming their ectoparasites, leading to net benefits for clients [39,40] but preferentially exploit them by eating mucus , while non-predatory clients lack the option to cheat in return (they cannot eat a cleaner). Lacking the option to reciprocate, clients face the challenge of inducing cleaners to feed against their preference, in order to obtain a good cleaning service. A frequent observation in the field is that clients terminate cleaning interactions earlier in response to cheating by a cleaner . We conducted a laboratory experiment to explore whether the clients’ threat of early departure might suffice to suppress net exploitation by the partner, as suggested by the model described earlier .
In order to manipulate all parameters of interest, we replaced clients with Plexiglas plates, and their mucus and ectoparasites a preferred and a less-preferred alternative food, respectively. The plates were attached to a lever, allowing the experimenter to respond to the foraging behaviour of the cleaners in predetermined ways. As the simple model predicts that the effectiveness of this partner control mechanism should depend on the potential benefits of cheating and on the ease with which victims may escape from exploitation, we manipulated both parameters and measured their relative impact on the foraging behaviour of cleaners.
Any evidence of the cleaners adjusting their foraging behaviour could be interpreted in two ways. First, adjustments may reflect an evolved capacity to adjust levels of cooperation to circumstances, as our experimental design corresponded to feeding on ectoparasites or mucus on mobile client reef fishes. However, there is the alternative explanation that we tested cleaners with a learning task that involved optimal foraging decisions. Thus, the task may not be linked to any adaptation to interactions with client reef fishes and hence could be equally well solved by species that do not engage in cleaning interactions. Therefore, in order to assess whether the experiment had the ecological relevance intended, we also tested a non-cleaner fish species, the pinstripe wrasse Halichoeres melanurus. Both species belong to the Labridae, a large family that originated around 65 million years ago and contains more than 600 species. The genera Labroides and Halichoeres are part of a more recent clade that originated around 30 million years ago . The two species are similar with respect to habitat and body size (maximum total length of 11.5 cm for L. dimidiatus and 12 cm for H. melanurus) .
2. Material and methods
Experiments with L. dimidiatus were conducted in June–July 2004 at Lizard Island Research Station, Great Barrier Reef, Australia (14°40′ S, 145°28′ E). Experiments involving H. melanurus were conducted in January–February 2012 at the same site. Sixteen adult cleaner wrasse and 14 adult H. melanurus were caught with a barrier net and held alone or in pairs in aquaria of varying sizes for at least 20 days before the experiments. During the acclimatization phase, cleaners were trained to feed on mashed prawn and fish flakes mixed with prawn (further called ‘flake’) spread on Plexiglas plates of varying colours. During the same phase, H. melanurus individuals were also trained to feed on mashed prawns spread on similar plates.
As cleaners have a strong preference for prawn over flakes , these two food types could be used in our experiments. Interestingly, H. melanurus did not have the same preference. Therefore, during acclimatization, combinations of various food types were offered to H. melanurus to find a suitable combination. The protocol followed Bshary & Grutter : H. melanurus individuals were presented with plates covered with seven items of each of two types of food tested. We considered an individual to have reached a preference when, in each trial's first seven items, one type of food was chosen more than 80 per cent of the time over three subsequent trials. We eventually found that all individuals showed such a preference for an 80 : 20 prawn : flake mixture against a 70 : 30 sand : prawn mixture. In the process of determining a suitable combination of foods, H. melanurus experienced in total between 29 and 51 food trials.
The basic experimental design is very similar to the protocol used in previous experimental laboratory studies on L. dimidiatus  and therefore will be described here only briefly. As a first step before the actual experiment, all fish were familiarized with the situation in which a plate with both preferred and less-preferred items would remain in their tank as long as they ate the less-preferred items, but would be removed immediately if they ate one preferred item. Immediate reaction was possible because the plate was attached to a lever held by the experimenter. Therefore, the fish had to feed against their preference if they wanted to increase their food intake. During six trials, 14 less-preferred items were presented together with only two preferred items. Under this condition, all fish regularly ate less-preferred items, and thus experienced both the positive and the negative consequences of their foraging decisions, giving them the opportunity to learn.
During the experiment, we offered the fish two Plexiglas plates (12 × 7 cm) with distinct novel colour patterns and with six black circles (1 cm diameter), each circle containing a food item. Three items were of the preferred food type and three items were of the less-preferred type. Plates were removed with a lever as soon as the fish ate one preferred item. For each fish, one plate was always removed so rapidly (less than 1 s) that the individual could not eat another item off of it (corresponding to strong victim control over the duration of interaction), while the other plate was removed slowly enough that the fish could eat other items (corresponding to weak victim control over the duration of interaction). The slow removal was adjusted to the speed of subjects foraging and would typically take 2–3 s. The slow removal functioned as intended, as all fish ate additional food items in these trials (average values for individuals: minimum, median, maximum items per round: 1.1, 1.5, 1.9, respectively, for L. dimidiatus and 0.9, 1.6, 2 for H. melanurus). Variation in the concentration of the less-preferred food type (50 or 10% flakes in the prawn–flake mixture for cleaners; 70 or 20% sand in the sand–prawn mixture for pinstripes) yielded either a high or a low temptation to eat the preferred item. All fish were confronted with each of the four combinations of speed of removal and temptation (figure 1) seven times, yielding a total of 28 trials per animal. Half of the subjects first completed the 14 trials involving low temptation on the first day and then the 14 high temptation trials on the second day, while the other half was exposed to the reversed order. Within the 14 trial units, the sequence of slow-moving and fast-moving plate trials was counterbalanced. All combinations of plate colour with speed of plate removal were equally balanced between all the individuals of each species to avoid any colour effects explaining the results. All fish were released at the site of capture after the experiment.
For each interaction, we scored the number of less-preferred items eaten by an individual prior to eating a preferred item. The number of less-preferred items was then used as response variable in a generalized mixed model (GLMM) with treatment as explanatory variable and individual as grouping factor. Pairwise comparisons between treatments were obtained with Tukey contrasts for each species separately. In order to assess the effect of the species and the initial temptation treatment, models, including these additional explanatory variables were compared with the original model by means of an ANOVA.
To calculate the average number of flake items eaten before the first prawn item under the 0-hypothesis that cleaners eat indiscriminately, we assumed that cleaners ate items at any stage according to their availability. Therefore, the probability that the first item eaten was a prawn was 0.5. The probability of eating a flake followed by a prawn was 0.5 × 0.6 = 0.3. The probability of eating flake, flake and then prawn was 0.5 × 0.4 × 0.75 = 0.15, and the probability of eating all the flakes before a prawn item was 0.5 × 0.4 × 0.25 = 0.05. Therefore, the null hypothesis predicts that cleaners ate on average 0.5 × 0 flakes plus 0.3 × 1 flake plus 0.15 × 2 flakes plus 0.05 × 3 flakes = 0.75 flake items before the first prawn item (dashed line in figure 1). We compared this value against the mean number of less-preferred items eaten per round for each fish in every treatment combination with Wilcoxon tests. All statistics were performed with R v. 2.14.0 , GLMMs were performed with the R package ‘lme4’  and Tukey contrasts were obtained with the R package ‘multcomp’ . The dataset is available in the Dryad repository (doi:10.5061/dryad.r70n0).
Cleaner wrasse L. dimidiatus foraging varied considerably between treatments (figure 1a). Both temptation (high temptation, HT; low temptation, LT) and victim control (high control, HC; low control, LC) had significant effects: all pairwise comparisons were significant (GLMM fit by Laplace: HT–LC versus HT–HC: z = −2.568, p = 0.049; HT–LC versus LT–LC: z = 5.041, p < 0.001; HT–LC versus LT–HC: z = −6.567, p < 0.001; HT–HC versus LT–LC: z = 2.653, p = 0.039; HT–HC versus LT–HC: z = 4.357, p < 0.001) except between the low and high control treatments when temptation was low (GLMM fit by Laplace: z = −1.781, p = 0.279). The combination of high temptation and low victim control led to cleaners ‘exploiting’ their ‘victims’ by favouring the preferred prawn items (observed foraging behaviour against random expectation indicated by dashed line in figure 1a; n = 16, V = 8, p = 0.002). When the temptation to cheat and victim control were both high, foraging behaviour was not significantly different from random (n = 16, V = 67, p = 0.979). Finally, in both treatments in which the temptation to eat prawn was low, cleaners ‘cooperated’ by eating significantly against their preference (LT–HC: n = 16, V = 113, p = 0.021, n = 16; LT–LC: n = 16, V = 113, p = 0.003).
Foraging was significantly different between the two species (comparison between GLMMs with and without species as explanatory variable: ANOVA, d.f. = 4, p < 0.001; figure 1b). Indeed, H. melanurus foraging did not differ significantly between treatments (GLMM fit by Laplace: all pairwise comparisons: –1 < z < 1.3, p > 0.5). In all treatments, they ate preferred food items significantly more often than expected by random choice (n = 14, all V = 0, all p ≤ 0.001). Whether individuals began the experiment with the high or the low temptation treatment did not affect the results significantly (comparison between a GLMM with both the species and initial condition as explanatory variables and a GLMM with only the species as explanatory variable: ANOVA, d.f. = 8, p = 0.109).
Our results provide experimental evidence that a most basic partner control mechanism, the threat of early termination of an interaction, may be sufficient to maintain cooperative behaviour in cleaning mutualism as long as the temptation to cheat is low. In our experiment, the degree of temptation influenced the foraging behaviour of cleaners more strongly than did the extent of their control over the duration of interaction. Nevertheless, both parameters had a significant influence. With respect to probable natural conditions, we note that the temptation to cheat should be relatively low: while cleaners preferred mucus over gnathiid isopods in a choice experiment they nevertheless regularly ate the latter. Another abundant type of ectoparasite, monogenean flatworms, was eaten with similar probability relative to mucus . Furthermore, client control over duration of interactions appears to be very high . Thus, the conditions favouring a mutualistic outcome, even in the absence of additional control mechanisms, seem to be fulfilled.
Terminating an interaction in response to cheating yields immediate benefits for the client in that it allows escaping further exploitation. In addition, it inflicts a cost on the cleaner since it reduces its foraging opportunities. As a result, cooperation might be promoted as a by-product of a self-serving behaviour (escaping exploitation), which is a form of negative pseudo-reciprocity [6,11,49], also called ‘sanctions’ by various researchers [13,30]. While some authors refer to such a mechanism as ‘self-serving punishment’ or ‘no cost punishment’ , we prefer to restrict the use of the term punishment to actions that entail a cost to the perpetuator, following Clutton-Brock & Parker  and Rahiani et al. . Sanctions provide a relatively simple mechanism to promote cooperation. Unlike punishment, it does not rely on future benefits arising from the increase in cooperative behaviour of the target to be under positive selection . Our results add further evidence to the growing perception that while more complex mechanisms such as punishment or reputational effects may receive more attention from researchers, rather simple partner control mechanisms like sanctions may often be responsible for stable cooperation in natural examples [2–4,35,49]: a plant's selective abortion of fruits infested with seed-eating larvae of the pollinator species , a plant's selectively reduced investment in root growth in areas in which nitrogen fixation by rhizobia bacteria is low [30,31], reduced probing duration by a pollinator if nectar quantities are low [52,53], the avoidance of a cleaner wrasse that has been observed cheating another client  or the premature departure of a client in response to cleaner cheating .
What is intriguing about our results involving the cleaners is that there is experimental evidence that clients often do not just terminate interactions prematurely in response to cheating but that they may in addition punish cleaners, switch to another cleaner for their next inspection and may observe cleaner–client interactions to avoid cheating cleaners [27,45]. In the future, we should ask how far punishment or partner switching, in addition to the threat of early termination, may push the quality of service provided by cleaners towards higher levels of cooperation (for models addressing this issue, see [54–56]). More generally, given that there are so many possible mechanisms of partner control, we suggest that it is only by studying the impact of each mechanism on the level of cooperation versus exploitation that empirical studies can determine which forms of partner control are of real biological significance.
Pinstripe wrasse H. melanurus were not capable of feeding against their preference in order to maximize their food intake and none of the treatments appeared to influence their foraging behaviour. This non-cleaner species eats small invertebrates off substrates  and hence should not face foraging decisions applicable to our experiments in real life. The results involving H. melanurus show that the adjustment of foraging behaviour to the experimental treatment is not a trivial optimal foraging task that can be easily solved by any species. Instead, we conclude that the experiment can only be solved by species for which the problem is ecologically relevant. More specifically, the non-cleaners failed to show the ability to feed against their preference. We predict that cleaner wrasse L. dimidiatus are indeed specifically adapted to this problem, because it is very likely to be rare in nature that an animal faces this problem. Even cleaning gobies can focus on eating preferred food as they prefer fish ectoparasites over mucus , which probably applies to most facultative cleaner fish species as well . As it stands, the cleaner wrasse L. dimidiatus shows strategic abilities adapted to interactions with clients that are absent in primates , despite the general importance of cooperation in primates’ social life [61,62]. As the experiments involving plates and food represent learning tasks, the cleaner wrasse's superior performances are linked to adaptations in the cognitive machinery, as predicted by the ecological approach to cognition [63,64].
In conclusion, in interactions in which the payoffs are a positive function of duration, cooperative outcomes may be achieved in cases in which only one player may cheat, as predicted by game-theoretic models [20,21,65]. The victim's control over early termination of the interaction and low temptation to cheat combine to affect the potential cheater's level of cooperation. Despite the apparent simplicity of the concept, it appears that stable cooperation still depends on partners being adapted to the specifics of the game structure.
We thank the staff of the Lizard Island Research Station for their wonderful support. We also thank Radu Alexandru Slobodeanu for advice on the statistical analysis and two referees for comments. Financial support was provided by NERC grant no. NER/A/S/2002/00898 (to R.B. and R.A.J.), Swiss National Science Foundation (to R.B.) and Australian Research Council (to A.S.G.).
- Received March 1, 2013.
- Accepted April 2, 2013.
- © 2013 The Author(s) Published by the Royal Society. All rights reserved.