Royal Society Publishing


Batesian mimicry occurs when a palatable species (the mimic) gains protection from predators by resembling an unpalatable or otherwise protected species (the model). While some mimetic species resemble their models closely, other species (‘imperfect mimics’) are thought to bear only a crude likeness. In an earlier study, pigeons (Columba livia) were trained to recognize wasp images in one experiment and non-mimetic (NM) fly images in another by rewarding the pigeons for pecking on the respective image types. These pigeons were subsequently presented with different images, including seemingly wasp-like hoverfly species, and the recorded peck rates on these images were used as a measure of the pigeons' perception of the hoverflies' mimetic similarity. To identify a candidate set of morphological features that the pigeons used when assessing this mimetic similarity, we first extracted a range of biometrical measurements from images originally presented to the pigeons. We then repeatedly optimized an empirical model in an attempt to match the recorded pigeon peck rates while using as few biometrical features as input as possible. Our models were able to fit the pigeon peck rates with considerable accuracy even while excluding many input features. Antennal length, a feature commonly used to discriminate between flies and wasps, was regularly retained as an input variable, but overall a different set of biometrical features was important for predicting the peck rates of pigeons rewarded for identifying wasps compared to those rewarded for identifying NM flies. In highlighting the importance of specific biometrical features in promoting mimicry and the irrelevance of others, our optimized models provide an explanation as to why certain species that appear to be poor mimics to humans are judged to be good mimics by birds.


1. Introduction

It has long been appreciated that some mimetic species bear only a crude resemblance to the models that they are thought to imitate, and there has been a continuing debate as to why the extent of mimetic similarity is not further improved upon by natural selection (Edmunds 2000, 2006; Johnstone 2002; Sherratt 2002; Holen & Johnstone 2004; Gilbert 2005). One explanation for imperfect mimicry may be that predators perceive mimetic similarity in a different way than humans. To address this possibility, Dittrich et al. (1993) attempted to ascertain, using operant learning techniques, the degree to which (Columba livia) pigeons perceived different species of seemingly wasp-like hoverflies as wasps or as non-mimetic (NM) flies. In the training phase of their experiments, pigeons were presented with images of wasps and NM flies, and were rewarded for pecking on images of the target species, namely wasps (wasp+ experiment) or NM flies (fly+ experiment). The pigeons were then presented with a range of images of wasps, NM flies and 11 mimetic hoverfly species, and the peck rates on these images were recorded. The general conclusion was that, while there was a broad agreement between pigeon ranking (measured by peck rate) and human ranking, pigeons treated some hoverfly species as better mimics than they appear to be to the human eye (see also Green et al. 1999).

Two key questions arose directly from the study: were the pigeons focusing on particular features when assessing mimetic similarity, and, if so, what features were they? Dittrich et al. (1993) commented in their discussion on these issues: ‘The question of why the visual systems of humans and pigeons reach such different conclusions remains open, but the answer is likely to lie in visual or learning constraints in the way in which birds classify their prey…’. Moreover, while the responses of pigeons trained to identify wasps (wasp+) were negatively correlated with the responses of pigeons trained to identify NM flies (fly+), this correspondence was only approximate, a result that the authors argued ‘might be attributable to slightly different cues upon which each group apparently concentrates’.

Given that this study represents one of the best evaluations to date of the extent of perceived mimetic similarity, and that it included the responses of birds following two distinct training regimes, we have attempted to elucidate the underlying basis for their decision making. Further investigation of the peck rate data may not only provide more general insights as to how and why birds tend to make classificatory decisions, but it could also, in theory, provide an explanation as to why the degree of mimicry of imperfect mimetic species is not improved upon by natural selection. Such an approach could be extremely useful, given the debate in the literature concerning mimetic similarity, and the general acknowledgement that the degree of mimicry is ‘very difficult to quantify’ (Turner 1984). Indeed, these tools are essential if we are to objectively test suggestions such as that imperfect mimics are rare, or that bee mimics are better mimics than wasp mimics (e.g. Gilbert 2005).

Images that were used to train and test the pigeons in the original study and the corresponding published results were available to us. Our objective therefore was to develop and evaluate a numerical model that could help predict pigeon peck rates based only on biometrical features extracted from images viewed by the original pigeons. In effect, we have attempted to ‘reverse engineer’ the discriminative process by evaluating the criteria that the pigeons may have used when reaching their decision as to how often to peck.

2. Material and methods

(a) Model overview

In what follows, we describe our ‘reverse-engineered predator’ (REP) model in terms of fitting wasp+ data using wasps as the target species; the fly+ data were analysed in a similar way, using NM flies as the target species. For each REP, we combined two distinct components in a simple facsimile of how pigeons process information, namely an optimized numerical classifier to assess the probability that the image was that of a wasp, and a simultaneously optimized probability-to-peck-rate conversion function that took that probability as input and output a predicted peck rate.

For our classifier, we chose a feed-forward neural network (see Enquist & Ghirlanda (2005) for details), namely nnet (Ripley 1996; Venables & Ripley 2002) available in R (R Development Core Team (2004), details at The classifier was initially optimized (‘trained’ in neural network terminology) to recognize wasps using biometrical data drawn from images of wasps and NM flies, paralleling how the pigeons had been trained. When subsequently presented with biometrical data from any new test image (such as a hoverfly), the trained neural network could then make an assessment of the probability of inclusion of the image in the class ‘wasp’. The optimized conversion function translated that probability into a predicted peck rate that ultimately could be compared to the available peck rate data.

In our model building, one objective was to eliminate unnecessary biometrical features as input. Our hope was that by identifying biometrical features regularly retained in separate models built using different datasets, we would gain some insights into the basis of the discriminative decisions made by pigeons. These insights could then be further tested using known peck rates on hoverfly species not included in the data fitting process.

(b) Data collection

Two sets of images from the original data used by Dittrich et al. (1993) were available: one set of 37 images (19 wasps and 18 NM flies) and a second set of 206 images (97 wasps, 48 NM flies and 61 hoverflies; see electronic supplementary material, appendix A).

We did not have an enumeration as to which images were used in each experiment in the original study, so we created 10 different artificial datasets (W–NMF-1 to W–NMF-10) composed only of wasps and NM flies. In all 10 cases, the data from the set of 37 images were included. For each set i=1, …, 10 in turn, from the set of 206 images, data from 21 randomly selected wasp and 22 randomly selected NM fly images were added to the W–NMF-i set to augment the set of 37 images up to 80 to provide a substantial set of only wasps and NM flies with which to train neural networks. Data corresponding to the remaining 163 images were used to create a corresponding W–NMF–H-i set, which had 76 wasps, 26 NM flies and all of the hoverfly images for which predictions could be made with a trained neural network. We created 10 datasets and averaged results over them rather than just one because there was variation in appearance within the wasps and within the NM flies, so a single random partition could produce misleading results if taken alone.

Seventeen features were extracted for each image: antennal length, head width, thorax width, abdomen width and wing length all expressed as a ratio to the total body length; number of visible colour stripes; number of visible colour patches; the mean and standard deviation of red (R), green (G) and blue (B) colours on the abdomen as a whole; whether the abdomen broadly attached to the thorax or was petiolate; a categorical description of wing transparency (completely transparent and clear wings, half clear and half translucent and translucent); abdominal curviness, and the main colour of stripes and/or patches (yellow, orange, red and yellow, cream, light grey, dark grey). The mean and standard deviation of abdominal RGB values were measured using Adobe Photoshop v. 6.0 (San Jose, CA). ImageJ v. 1.29x (National Institutes of Health, USA) was used to measure all of the other continuous variables. Curviness of the end of the abdomen was calibrated using the coefficient of circularity as measured by ImageJ. Because dimensions were tabulated as ratios (the images of specimens were all approximately equal size), the actual size of the specimens was not used. Furthermore, orientation was not considered as a factor because all of the images viewed by pigeons were of pinned specimens with the same orientation. Even using this limited set of predictor variables, approximately 5–7 days of computer time on a workstation with a 3.2 GHz Intel Pentium 4 processor (Santa Clara, CA) were required for each point plus standard deviation bar on figure 1a,b.

Figure 1

Mean predicted peck rate for each taxonomic group using the REP with ω=1. Species names: S. ri, Syrphus ribesii; T. ve, Temnostoma vespiforme; C. ca, Chrysotoxum cautum; H. pe, Helophilus pendulus; E. gr, Epistrophe grossulariae; X. pe, Xanthogramma pedissequum; C. bi, Chrysotoxum bicinctum; S. ve, Sphecomyia vespiformis; V. zo, Volucella zonaria; S. py, Scaeva pyrastri; I. gl, Ischyrosyrphus glaucius. (a) Wasps and (b) NM flies were a mixture of species (electronic supplementary material, Appendix A). Error bars represent ±1 s.d. of the 50 values (5 repetitions×10 datasets). An example image for each taxonomic group is also given (see Dittrich et al. (1993) for further images).

(c) Numerical modelling of the pigeon peck rate experiment results

Two sets of parameters were optimized simultaneously within each REP model: the tuning parameters for the neural network that defined its structure and the parameters in the conversion function (see electronic supplementary material, Appendix B for further details). A genetic algorithm (GA, Whitley 1994) was used to find the optimal parameter combinations. In our GA, potential solutions were encoded in a string of 55 zeros and ones. Three information packets were included in this solution set, namely: (i) which of the 17 predictor variables were to be used, (ii) the neural network structure, and (iii) the parameters in the conversion function. The objective of our GA was to maximize fitness defined byEmbedded Image(2.1)where SSE (summed over all 13 taxonomic groups evaluated) isEmbedded Image(2.2)and ω is a cost coefficient set by us for inclusion of predictor variables (1, slight; 10, moderate; 100, severe cost for inclusion). In essence, our challenge was to obtain accurate predictions using as few predictor variables as possible. Animals do not have a limitless capacity for processing information (MacDougall & Dawkins 1998 and references therein), so the scalar ω was introduced simply because we judged it appropriate to introduce (and explore) the effects of these information costs.

To obtain one fitness evaluation for dataset i with ω given, the following steps were taken:

  1. Train the neural network, using the 80 images in W–NMF-i, to distinguish wasps from NM flies using the predictor variables encoded in packet 1 and the network structure encoded in packet 2.

  2. Predict the probability of being a wasp for all 163 images in W–NMF–H-i using the trained network from step (i).

  3. Convert the predicted wasp probabilities to predicted peck rates using the conversion function with parameters encoded in packet 3.

  4. Average the predicted peck rates for each taxonomic group (wasps, NM flies and the 11 hoverfly species).

  5. Find SSE.

  6. Find fitness.

The GA iteratively combined segments from a population of 200 strings to ‘breed’ a solution, over a preset number of ‘generations’ in a manner that maximizes fitness. The list of retained predictor variables that produce the maximum fitness and the corresponding predictions was recorded. Because there are stochastic elements in the model building, each simulation produced different results, so for each ω value we ran 50 simulations, namely 5 repetitions for each of the 10 datasets.

As a final note, in some cases, predictions from the neural network had to be made for stripe/patch colour not available for training the neural network, e.g. the colour was only found on hoverflies (electronic supplementary material, appendix A). As a check, we re-ran the simulations with 12 specimens removed using wasp+ data and ω=1 to avoid this occurrence. The results were similar and are presented in the electronic supplementary material, Appendix A.

3. Results

Our REP models were able to match the pigeon peck rate data well (figure 1a,b). The product moment correlations of mean observed to mean predicted peck rates for wasp+ were R2=0.93 (ω=1), 0.93 (ω=10) and 0.94 (ω=100), with d.f.=11, p<0.0001 in all cases. For fly+, these correlations were R2=0.89 (ω=1), 0.89 (ω=10) and 0.79 (ω=100), with d.f.=11, p<0.0001 in all cases. There is an apparent general bias for the wasp+ prediction in that observed peck rates for the most wasp-like species were underestimated while observed peck rates for the most (NM) fly-like species were overestimated. The most likely explanation is that in keeping the squared deviation from growing too large for Sphecomyia vespiformis (which the neural network consistently rated much more wasp-like than the pigeons did), the REP reduced the peck rate slightly for the other wasp-like hoverflies.

Given the high goodness of fit achieved using only knowledge of biometrical features of the images, we are led to conclude that the pigeons did respond to hoverfly images based on their morphological appearance and that our method could help identify key features that the pigeons employed in reaching their discriminative decisions.

To elucidate these key features, we recorded how often predictor variables were retained in the repeated runs of the REP model (figure 2). The retention frequency differed significantly from uniform for both the wasp+ experiment and the fly+ experiment (G-tests for homogeneity, wasp+: ω=1, G=154.2; ω=10, G=169.5; ω=100, G=186.5; fly+: ω=1, G=197.4; ω=10, G=233.2; ω=100, G=226.9; d.f.=16, p<0.0001 in all cases), indicating that certain features were much more important than others in facilitating discrimination. The most important predictor variables overall were the number of stripes, antennal length, standard deviation of abdominal R values (this reflects contrasting patterns based on RGB colour measurements) and stripe/patch colour.

Figure 2

Occurrences of retained predictor variables based on each pigeon response set. Occurrences are based on the number of times a predictor variable was retained from building the model 50 times (5 repetitions×10 datasets) each with ω=1, 10 and 100 using (a) wasp+ and (b) fly+ data (5×10×3 combinations each).

Intriguingly, the relative retention of the different predictor variables varied significantly with pigeon training regime (wasp+ versus fly+; Χ2-test for association: ω=1, Χ2=86.4; ω=10, Χ2=105.8; ω=100, Χ2=100.3; d.f.=16, p<0.0001 in all cases), indicating that the pigeons focused on different discriminatory features, dependent on what insect order they were rewarded for identifying. The standard deviation of abdominal R values, the number of visible colour stripes and patches and the stripe/patch colour were frequently retained in our models based on the wasp+ data; however, while standard deviation of abdominal R values and stripe/patch colour were likewise frequently retained for models based on the fly+ data, antennal length, head width and abdomen–thorax attachment type were much more important in generating the fitted outcome. It therefore seems probable that the pigeons trained in the wasp+ regime used the presence and abundance of colourful patches and stripes as the most important features, while pigeons trained in the fly+ regime relied more upon features such as antennal length, head width and abdomen–thorax attachment type to discriminate between wasp and NM fly images.

We also examined the ‘internal predictability’ of our REP model by fitting the model to all of the peck rate data except for a single taxonomic group, then predicting peck rates for the excluded group. This analysis was heavily influenced by the peck rate on one species, Helophilus pendulus, for which only one image was presented (electronic supplementary material, Appendix A), but overall our predictions for the excluded groups correlated well with the actual peck rate (R2=0.56, d.f.=10, p=0.005 for all test groups except H. pendulus; R2=0.30, d.f.=11, p=0.055 for all test groups).

4. Discussion

Neural networks have previously been applied in numerous ways in the ecological literature (Tosh & Ruxton 2007 and papers therein); for example, to simulate model–mimic dynamics (Holmgren & Enquist 1999), to understand aspects of predator behaviour and predator perception (Merilaita & Tullberg 2005; Tosh et al. 2006) and to aid taxonomic classification (Clark 2003). Here we have employed them as part of an empirical model to help identify the key features used by pigeons when classifying the images with which they were presented. To our knowledge our ‘reverse engineering’ model represents a new method that might be applied more widely to understand why predators make the discriminatory decisions that they do.

Of course, any identification of key features used in discrimination is limited by the biometrical data that were extracted. However, in order to minimize the potential bias in the neural network prediction, we measured a wide range of visible biometrical attributes, not just the ones we viewed as taxonomically important. Likewise, it is well known that many birds are able to see UV wavelengths (e.g. Church et al. 2004), yet the slide photographs shown to pigeons omit any UV components of the colour pattern (Cuthill & Bennett 1993), so we did not need to consider such issues when attempting to understand the response of pigeons. In fact, there is no evidence of a significant UV component to either wasp or hoverfly mimetic patterns (Nickol 1994; Green et al. 1999), and subsequent work has largely confirmed the findings of Dittrich et al. (1993), with real insect specimens replacing the slides as stimuli (Green et al. 1999).

Our models fitted the pigeon peck rate to the hoverfly images with considerable accuracy even after excluding the majority of predictor variables, and they successfully predicted the peck rate of species systematically left out of the model fitting process. Moreover, our elucidation of the input variables used most frequently in the decision-making process appears to be consistent with our understanding of mimicry within this group. For example, it is not surprising that pigeons trained to discriminate wasps in a sample of wasps and NM flies would subsequently use the number of visible stripes and presence of colourful patches in deciding whether a given specimen was a wasp. In addition, antennal length is thought to be among the most important features used by predators to discriminate wasps from flies (Gilbert 2005), and indeed some hoverfly species, such as Spilomyia longicornis and Temnostoma spp., wave their front tibia, which are darkened unlike their other legs, to mimic the presence and movement of the antennae of their potential wasp models (Waldbauer 1970). Although our REP models were based on an approximation of the actual experiment, the goodness of fit achieved using only the knowledge of biometrical features strongly suggests that the pigeons did respond to hoverfly images based on certain key features of their morphological appearance.

As Dittrich et al. (1993) had anticipated, our model also indicates that different discriminatory features are used by pigeons rewarded for identifying wasps compared to those rewarded for identifying NM flies. This conclusion is consistent with the observation that when there are multiple features available for the same discriminatory response, they compete for an animal's attention (Shettleworth 2005). Why different features are employed under the different reward schemes is unclear, but it is possible that pigeons more readily associate the presence of a feature (such as colourful patches) with a reward, rather than the absence of a feature. For example, when there is a reward for pecking on NM flies, it seems easier to employ a rule such as ‘peck when head is wide’ rather than ‘peck when stripes are absent’. More generally, this work highlights that the nature of the training regime can have an important effect on subsequent decisions. It is widely appreciated that the prior experience of predators influences their subsequent dietary preferences (see Ruxton et al. (2004) for a review), and we note in passing that a neural network was far more successful at correctly identifying hoverflies as flies when it encountered them within the training regime itself (Rashed 2006).

Much like the prediction of ‘missing data’, once the response criteria for several hoverfly species have been elucidated, our model can be used to estimate the perceived mimetic similarity of any new species of hoverfly. As a specific illustration of the use of our REP to measure the mimetic similarity of a novel species, we extracted biometrical data for nine specimens of the hoverfly Episyrphus balteatus, a species not included so far in any aspect of our analysis but used by Dittrich et al. (1993) in a parallel set of training regimes (both wasp+ and fly+) that used images of flies and wasps set in their natural surroundings. Episyrphus balteatus was singled out by Dittrich et al. for appearing to be a poor mimic to human eyes, yet being assessed by pigeons as being wasp-like. On average, using ω=1, our REP models likewise predicted that pigeons trained in the wasp+ regime would have a peck rate of 48 (making it the best wasp mimic compared with all others tested by Dittrich et al. in the original wasp+ regime; figure 1a) and those trained in the fly+ regime would have a peck rate of 25 (again the best wasp mimic compared with all others tested by Dittrich et al. in the original fly+ regime; figure 1b). Neural networks use nonlinear combinations of predictor variables, so looking at single variables may be misleading; however, by inspecting individual plots of the key predictor variables listed, it appears that the combination of stripe/patch colour, number of patches and standard deviation of abdominal R values, which are all more wasp-like than fly-like for E. balteatus, overrode the distinctly fly-like nature of antennal length. This suggests that ‘imperfect mimicry’ could persist simply because certain traits that would in theory facilitate discrimination are effectively ignored by predators.

Although pigeons are not insectivorous, bird visual systems are thought to be highly conserved (Dittrich et al. 1993), so the qualitative results are likely to be relevant to a range of bird species. Our work has provided testable insights into how pigeons used morphological traits to inform their discriminative decisions. In so doing, we can help explain why pigeons consistently assessed certain hoverfly species as good mimics, when they appear to be relatively poor mimics to human eyes.


We thank Daniel Franks and Colin Tosh for their comments on an early draft. Funding was provided to T.N.S. by the Natural Sciences and Engineering Research Council of Canada (NSERC) and the Canada Foundation for Innovation (CFI). We also acknowledge David Grewcock, who devised the idea of using a neural network as a ‘tireless predator’ in 1992 as part of his PhD research.



View Abstract