Disruptive coloration, crypsis and edge detection in early visual processing

Martin Stevens, Innes C Cuthill

Abstract

Many animals use concealing markings to reduce the risk of predation. These include background pattern matching (crypsis), where the coloration matches a random sample of the background and disruptive patterns, whose effectiveness has been hypothesized to lie in breaking up the body into a series of apparently unrelated objects. We have previously established the effectiveness of disruptive coloration against avian predators, using artificial moth-like stimuli with colours designed to match natural backgrounds as perceived by birds. Here, we investigate the mechanism by which disruptive patterns reduce detectability, using a computational vision model of edge detection applied to photographs of our experimental stimuli, calibrated for bird colour vision. We show that, disruptive coloration is effective by exploiting edge detection algorithms that we use to model early visual processing. Thus, ‘false’ edges are detected within the body rather than at its periphery, so inhibiting successful detection of the animal's body outline.

Keywords:

1. Introduction

Many animals have evolved a range of defensive colours and patterns to reduce the risk of predation, of which camouflage is one strategy. Camouflage can take several forms, the most frequently cited example being background pattern matching (Cott 1940; Endler 1978, 1984, 1991; Ruxton et al. 2004), where concealment is achieved by representing a random sample of the background at the time and place of greatest predation risk (Endler 1978, 1984, 1991). However, Thayer (1909) and Cott (1940) argued that background matching alone would not optimize camouflage, because an animal's body outline always forms a clear boundary between itself and the background and the shape of the individual is a salient feature in object detection. Thayer (1909) proposed a theory of disruptive coloration, extended by Cott (1940), which argued that conspicuous shapes and patterns placed at the animal's periphery are used to disguise the animal's outline (reviewed by Stevens et al. in press a). In particular, Cott (1940) hypothesized that disruptive coloration would be especially effective if: (i) some patches on an individual stand out from the background, while other patches blend in (termed ‘differential blending’) and (ii) adjacent pattern elements are highly contrasting in tone, termed ‘maximum disruptive contrast’. Recent support for disruptive theory has been forthcoming (Merilaita 1998; Cuthill et al. 2005; Merilaita & Lind 2005). For example, in a field study involving wild avian predators, Cuthill et al. (2005) showed that artificial moth-like targets with differentially blending disruptive markings survived significantly better than targets with background matching patterns. Furthermore, targets with highly contrasting disruptive pattern elements survived better than targets with low-contrast patterns.

Discussions of disruptive coloration argue that it is effective because it obliterates the animal's outline, producing the appearance of a set of distinct and apparently unrelated objects (Thayer 1909; Cott 1940; Merilaita 1998; Ruxton et al. 2004; Cuthill et al. 2005; Merilaita & Lind 2005; Stevens et al. in press a). These and many textbook accounts of animal camouflage, draw from the tradition of Gestalt psychology rather than contemporary theories of vision (i.e. post Marr 1982), which seek to determine the mechanisms and algorithms implicated at each stage of visual processing (e.g. Rolls & Deco 2002). In understanding how a given signal (or in the case of camouflage, lack of signal) functions, we must consider the visual and cognitive systems of the animals perceiving the object (Guilford & Dawkins 1991). In the case of camouflage, we need to understand how an animal discriminates between a potential prey item and the background, both of which may share many features in common. Here, following Osorio & Srinivasan (1991), we provide and test a theoretical treatment of how disruptive coloration may work and why the strategy is effective in preventing object recognition, using a computational model of avian vision. The hypothesis, which grounds Thayer's (1909) and Cott's (1940) arguments in specific mechanisms (or computational models, sensu Marr 1982) of object detection, is that disruptive coloration exploits edge detection algorithms in the visual system of the predator. This creates ‘false edges’ within the body of the prey and so subsequent object detection algorithms fail to segment the prey from the background. This proposition is not novel (e.g. Osorio & Srinivasan 1991), but its application to experimental detection data from non-human predators is.

Our model has four components, based on Marr & Hildreth's (1980) proposal for a multi-scale computational analysis of vision, but with assumptions that are found in most models of vertebrate vision. First, the photon catches of a model bird's photoreceptors, viewing a given scene are calculated. Second, these are transformed to both additive and opponent neural channels (signals). Third, the different luminance and chromatic signals are processed by an edge-detection algorithm at different spatial frequencies (De Valois & De Valois 1980; Shapley & Lennie 1985; Graham 1989; Gordon 1997; Elder & Sachs 2004). Edge detection, via sharp changes in light intensity or spectral composition (defined mathematically below), has a primary role in object/background segmentation because changes in light intensity and composition frequently occur where one object ends and another begins (Bruce et al. 2003; Elder & Sachs 2004). The representation of objects as edges, prior to object recognition, reduces the amount of data to be processed but retains information about the shape and location of objects. This edge information is then used in the fourth step in the model, a line detection algorithm. While vertebrate visual systems possess bar and line detection mechanisms (De Valois & De Valois 1980; Graham 1989; Gordon 1997; Bruce et al. 2003), we are not claiming that this is a necessary, or even likely, part of natural object detection. Rather, the specific stimuli in our experiments were triangular, so line detection was a simple means of implementing an object detection algorithm relevant to our stimuli. The overall aim of the experiment was to determine if there was any difference in the ability of the model to detect the cryptic or disruptive treatment types as a result of how they have been encoded by edge-detection mechanisms.

2. Edge processing model

(a) Stimuli

The input for the model is digital images of the treatments from experiment 2 of Cuthill et al. (2005), photographed in situ, on mature oak (Quercus robor) tree trunks (figure 1a). Treatments were designed from thresholded (binary or two-tone) digital samples of oak bark and so, while not modelled on any real moth species, the patterns themselves were both natural and cryptic. The treatments consisted of: (i) targets with disruptive patterns, with markings touching the edge of the ‘wings’ (treatment D), (ii) targets with non-disruptive background matching (M) patterns, with markings also matching a random sample of the background, but with the stipulation that the markings would not touch the target outline and (iii) treatments consisting of the spatially averaged colour of the two pattern elements (figure 1a). The disruptive and background matching treatments also comprised targets of high or low contrast between the two colour elements. For a further explanation of the treatment designs, see Cuthill et al. (2005) and Sherratt et al. (2005).

Figure 1

(a) An example set of targets with inside/background pattern matching markings (top), disruptive markings (middle) and average monochromatic colour (bottom), as used in Cuthill et al. (2005). Each replicate target had different patterns. (b) An example set of processed edge images (not the same targets as in (a), corresponding to the background matching (top), disruptive (middle) and average monochromatic treatments (bottom), showing the lack of edge information apparent in the disruptive treatment.

(b) Image acquisition

Digital images were obtained of targets from each treatment on a sample of mature oak tree trunks in Leigh Woods National Nature Reserve, North Somerset, UK (2°38.6′ W, 51°27.8′ N). Since the visual background, even on the same type of tree of similar age, is highly complex in terms of hue and spatial characteristics, replicates of treatments were placed on a sample of different trees, following a repeated measures design. One replicate from each treatment was pinned in turn to the same location on a given tree and a digital (uncompressed tagged image file format) image acquired with a Nikon Coolpix 5700 digital camera, using manual exposure and with automatic white balance disabled. This process was repeated with 26 sets of each treatment pinned to 26 different trees.

(c) Image calibration

The images were processed with a custom program written in Matlab (The Mathworks Inc., MA, USA), with its Image Processing Toolbox and were cropped to 256×256 pixels.

Since most digital cameras produce images with a nonlinear response in terms of changing pixel (RGB) values as intensity increases, we first linearized our images (Párraga et al. 2002; Párraga 2003; Westland & Ripamonti 2004; Stevens et al. in press b). We transformed the camera's red (R), green (G), blue (B) data to a bird-specific colour space using a polynomial mapping method (Párraga et al. 2002; Párraga 2003; Westland & Ripamonti 2004; Stevens et al. in press b). We used the European starling (Sturnus vulgaris) as the model (Hart et al. 1998), because the cone sensitivities are typical for passerine birds (Hart 2001), identified as the major predators in our study system. While there is evidence that texture discrimination in birds is primarily fulfilled using achromatic signals (e.g. Jones & Osorio 2004), this does not preclude a role for colour information. Therefore, we calculated images relating to all the potential avian achromatic and chromatic signals, as it is not certain which ones are present in avian vision, nor their relative importance. The images of the ‘moth’ targets were converted to four image planes corresponding to the relative photon catches of a starling's long-wave (LW), medium-wave (MW) and short-wave (SW) cones, plus a luminance image (LUM) as a function of dorsal double cone sensitivity (receptor sensitivity data from Hart et al. 1998). Evidence indicates that avian luminance is a function of the double cones (Osorio et al. 1999a,b; Jones & Osorio 2004; Osorio & Vorobyev 2005), and luminance is especially important in spatial vision. Six potential avian opponent channels were then calculated using the LW, MW and SW images, since there is evidence of opponent processing in avian vision (Osorio et al. 1999b; Smith et al. 2002): red–green (R–G), red–blue (R–B), green–blue (G–B), blue–yellow (B–Y, with Y=R+G), red–cyan (R–C, with C=B+G) and green–magenta (G–M, with M=R+B). Therefore, each original image was transformed into 10 different (colour channel) images, each of which was analysed separately. The mapping of images from camera to avian colour space is viable for a variety of reasons. First, although birds possess a fourth cone class, sensitive to the ultraviolet/violet waveband (Cuthill et al. 2000; Hart 2001), we could ignore this because of the similar and low UV reflectivity of the lichen-free oak bark and the stimulus targets used (Cuthill et al. 2005; Cuthill et al. 2006) and because most of the pattern contrast in our stimuli was at relatively longer wavelengths. Second, the correspondence between the spectral sensitivities of our camera's three sensors (SW, MW and LW) and the SW, MW and LW cones of a starling (or any other passerine bird) correspond extremely well (see Stevens et al. in press b). Third, as one might expect from this high correspondence, when one calculates the photon catch of the camera's and the bird's SW, MW and LW receptors viewing oak bark or the artificial moths used in the study, using measured reflectance and irradiance spectra in the field, the correlations are extremely high. Using a sample size of 30 bark spectra, for the SW, MW and LW receptors, the camera-starling correlation is 0.996, 0.967 and 0.996, respectively, (I. C. Cuthill & M. Stevens 2006, unpublished data). If one calculates nominal Red–Green and Blue–Yellow opponent channels, the camera starling correlation for R–G is 0.965 and for B–Y it is 0.996. This means that, mapping from camera to avian colour space is highly accurate. We further validated the mapping technique by using simultaneous radiance measures and photographs of Macbeth (Macbeth, Munsell Color Lab, New Windsor, NY, USA) Colorchecker charts (I. C. Cuthill & M. Stevens 2006, unpublished data). When one compares the SW, MW and the LW values for colour samples as calculated from transforming the camera LW, MW and SW values to starling SW, MW, LW and double cones using the mapping technique, with those values calculated from complete radiance spectra and starling sensor/cone sensitivities, the correlations between the two sets of estimates were 0.962 (LW-cones), 0.984 (MW-cones), 0.973 (SW-cones) and 0.983 (double cones) with a mean bias (over-estimation of cone-catch) of only 2.53% (see also Párraga et al. 2002 for analogous mapping to human colour space). For a full description of the image calibration, see electronic supplementary material 1.

(d) Edge detection

Edges are often characterized by abrupt changes in intensity within an image. Consider a simple one-dimensional intensity/grey-scale profile, I(x), with each point (pixel) having a given value, I1, I2, I3In. The change in intensity from one point to the next along the profile, is approximated by the difference in intensity, δI, between adjacent pixels, divided by their spatial separation in pixels δx (Bruce et al. 2003). An ‘image operator’ or filter can be applied to a grey-scale image to produce a new image in which each pixel corresponds to the gradient at a given location. The presence of an edge can then be found at peaks in the gradient image (i.e. where the value of a pixel in the first derivative is larger than the value of its neighbours). Marr & Hildreth (1980) argued that edges should be better located in the second derivative (δ2I/δx2) of the image, by applying the operator twice in succession. At a gradient peak in the first derivative, the slope is zero, so edges in the second derivative are located by the presence of ‘zero-crossings’. Human experiments show that people mark the positions of edges close to zero crossings of the second derivative, though it is not clear if it is something closer to the first or second derivative that is used in visual processing (Bruce et al. 2003).

The above discussion has only considered a profile in one orientation, but edges can be found in any direction by combining two-second derivative operators working in two or more directions. This is comparable to orientation sensitive cells in visual systems. Marr & Hildreth (1980) produced a circular filter by simply combining two operators acting at right angles to each other, to produce the Laplacian operator, ∇2, defined as:Embedded Image(2.1)where x and y are the pixel values in the x and y direction of the image. Noise, which can be present in the natural scene or be introduced during visual processing, can cause true edges to be missed or create false positives. Therefore, Marr & Hildreth (1980) used a Gaussian operator to smooth the image to combat noise, defined as:Embedded Image(2.2)where σ is the standard deviation and controls the width of the Gaussian. Varying the width of the Gaussian also has the effect of analysing the images at different spatial frequencies, but in a more physiologically plausible way than Fourier filtering (Marr & Hildreth 1980). This is important for Marr & Hildreth's (1980) model because, if the filter used to determine the presence of edges is of one size or operates on one scale alone, it would average out fine details and be unreliable for larger features (Bruce et al. 2003). Therefore, filtering the images at different spatial frequencies allows edge information to be captured at different spatial scales. This process also allows the ‘spatial coincidence assumption’ to be satisfied (see §2e). Each of the 10 colour channel images was filtered for 11 different values of σ, ranging from 0.5 to 5.5 in steps of 0.5. This process mimics the presence of different receptive field sizes present in visual systems. The Gaussian operator can be combined with the Laplacian operator to create the Laplacian of Gaussian (LoG) operator, defined as:Embedded Image(2.3)While it is unproven that vertebrate visual systems locate edges at zero-crossings (for a discussion see Georgeson & Meese 1997, 1999), the LoG operator has a circular centre-surround receptive field similar to the receptive fields of some retinal ganglion cells, cells in the lateral geniculate nucleus and some cells in the input layer of V1 in the visual cortex of primates and could also be formed by summing the outputs of orientation-sensitive cortical cells (Bruce et al. 2003). Similarly, such receptive fields are likely to be present in avian vision.

Each of the images was LoG filtered, with the 11 different values of σ, followed by adaptive thresholding of the images to create binary edge images (the actual thresholding value depended upon the image properties and minimized the intraclass variance of the black and white pixels (Otsu 1979). Visual neurons adapt to the ambient luminance levels, with the necessary contrast for a response depending upon mean luminance, so a response is only produced once a certain threshold has been reached in terms of the contrast between different pattern elements (Graham 1989).

In addition to implementing a LoG edge detection algorithm, we also used a Sobel edge detector (see Gonzalez et al. 2004) on Fourier filtered images at different spatial scales, to determine if the results were robust to different methods of detecting edges (see electronic supplementary material 2).

(e) Combing spatial scales and the ‘spatial coincidence assumption’

The LoG filter can produce some false positives in the edge images. One way of guarding against this is to satisfy Marr & Hildreth's (1980) ‘spatial coincidence assumption’, where an edge is only considered present in a scene if the same edge is found in two or more adjacent frequency scales (adjacent values of σ). Therefore, pairs of adjacent frequency images were added and the resultant images were re-thresholded, so that the only remaining edges were those found in both images of each pair. Finally, all 10 ‘adjacent pair’ thresholded images were averaged. This produced a single final edge image for each channel, with the strongest edges occurring where many spatial scales ‘agreed’.

Figure 1b shows an example set of processed images. The edge images are not intended to be visually inspected rather comprising a symbolic representation of the image in terms of edges and bars (Marr & Hildreth 1980; Bruce et al. 2003). An objective method is therefore required to determine if the edge detection model has been ‘fooled’ by, or has detected, the moths in the original images. An additional stage to the model, which mimics line detection mechanisms present in animal vision, was therefore developed.

3. The decision stage

(a) Line detection using the Hough transform

While the edge images produced by the LoG filter can be inspected to give a subjective indication of the effectiveness of the camouflage of the different treatments in terms of edge information, an objective method is required to quantify the effectiveness of the different treatment types in ‘fooling’ the edge detection algorithm. There is ample evidence for the presence of line detectors in visual systems, (reviewed by De Valois & De Valois 1980; Graham 1989; Gordon 1997; Rolls & Deco 2002) and line detection is an important part of object recognition (Graham 1989; Gordon 1997; Bruce et al. 2003). Therefore, one way to determine if the edges of the various targets are intact after the edge detection stage of the model is to use an algorithm that searches for the most salient line information in an image. One computational approach to locate the presence of lines in an image is the Hough transform. It is unlikely that visual systems detect lines in a manner similar to a Hough transform algorithm and so this technique is merely a convenient method of locating lines in an image, enabling us to obtain a specific value corresponding to the number of correct edges of the moth targets detected. Theoretically, those targets with disruptive markings will have their edge information degraded, so that the Hough transform fails to locate the true edges of the body as effectively as for the other treatments. Furthermore, those individuals with the highly contrasting patterns may fool the line detection algorithm more than those individuals with the low-contrasting patterns. In this stage of the model, we employ a Hough transform based on Gonzalez et al. (2004), with the task to locate the three lines in each edge image with the strongest support. The results of the Hough transform were recorded (0, 1, 2 or 3 correct edges detected), and then analysed for each of the 10 channels with a Friedman test to determine if there was a difference between treatments. Since only one target of each treatment was placed on any one tree (i.e. one disruptive, one background matching and one average treatment), the analysis of the highly contrasting versus low-contrast treatments involved comparisons between treatments placed on different trees. These comparisons were undertaken with Wilcoxon–Mann–Whitney two-sample tests.

4. Results

Analysis of the Hough transform results showed significant treatment differences (after adjusting critical p-value thresholds by the sequential Bonferroni method to control for repeated testing; Rice 1989) in the number of correct lines located in 6 of the 10 colour channels (results for B–Y, R–G, R–B and G–M were non-significant) (table 1). Wilcoxon matched pairs signed ranks tests showed that significantly fewer correct edges were located for the disruptive treatment versus the background matching (inside) treatment and versus the average treatment (table 1). There was no difference in the number of correct edges detected between the inside and average treatments (table 1). The results obtained with the Sobel edge detector were highly comparable in terms of disruptive moths being harder to detect than non-disruptive treatments (see electronic supplementary material 2).

View this table:
Table 1

Statistical results for the Hough transform for all 10 colour channels ran on edge images produced with the LoG detector, including pairwise comparisons where appropriate. (Significant results are marked with *. Critical thresholds for p for the Friedman tests were determined according to sequential Bonferroni correction. n.a.=not applicable.)

Wilcoxon–Mann–Whitney tests showed that there was no difference between the highly contrasting pattern elements and those targets with patterns of relatively lower contrast in any of the treatment types for the LoG filter (table 2). However, for the Sobel edge detector there were significantly fewer correct edges identified for the highly contrasting targets than the targets with low-contrasting patterns in the LW and LUM channels for the disruptive treatment (see electronic supplementary material 2).

View this table:
Table 2

Statistical results for the comparisons between the high- and low-contrast patterns for the inside, disruptive and average treatments for the LoG edge detector. (Values in each cell are the medians of the high- and low-contrast treatments, respectively, followed by the Wilcoxon–Mann–Whitney test statistic and p-value. No results from the two-sample Wilcoxon–Mann–Whitney tests were significant. N=10,16 in all cases and critical thresholds for p for the tests were determined according to table-wise sequential Bonferroni correction.)

5. Discussion

Typical discussions of how disruptive coloration works have until presently argued that disruptive markings break up the appearance of an animal's body into a series of unrecognizable objects (Thayer 1909; Cott 1940; Merilaita 1998; Cuthill et al. 2005; Merilaita & Lind 2005). This study, following the lead of Osorio & Srinivasan (1991), provides a formal theoretical framework, based on known properties of visual systems for how disruptive coloration may exploit edge detection algorithms that are likely to be a vital component of the spatial vision of all vertebrates. In this paper, a model of visual processing of natural scenes was used, based upon the model of Marr & Hildreth (1980). Edge detection is likely to be followed by line detection algorithms as part of the process of deconstructing and interpreting the visual scene. The results from the Hough transform line detection algorithm support the hypothesis that the targets with the disruptive patterns did exploit the edge detection algorithm used in this study and provided evidence that this was especially effective when the moth's pattern was highly contrasting in tone (especially luminance/long-wave reflection, although it is here that the greatest spectral contrasts lay in these bark-coloured stimuli). While clearly of interest, our study was not designed to compare the effectiveness of edge detection in different chromatic and achromatic signals, though it should be noted that disruptive coloration was effective at fooling edge detection algorithms in both chromatic and achromatic signals and not only in terms of long-wave information.

The results from this study provide a mechanistic (computational and algorithmic) explanation for the field data obtained by Cuthill et al. (2005). Osorio & Srinivasan (1991) also found that, enhanced border profiles could be effective in preventing recognition of frogs with respect to edge detection algorithms occurring in garter snake Thamnophis sirtalis vision. To truly understand how a colour pattern achieves its goal, such as preventing object recognition, an understanding of the visual system of the signal receiver is essential. The use of artificial vision models, based upon real visual systems, is a potentially enlightening method of understanding complex patterns, such as camouflage. To further understand how animal coloration works from the perspective of the receiver, we must further explore the spatial vision properties of animal coloration and how such visual information is analysed and interpreted by higher processes of feature extraction and object recognition.

Acknowledgments

The research was supported by a BBSRC grant to ICC, T. Troscianko and J. C. Partridge and by a BBSRC studentship to M.S. We thank Tom Troscianko and Julian Partridge for comments on the manuscript and Alejandro Párraga and George Lovell for advice on programming. We thank Daniel Osorio and an anonymous referee for clarifying the science and for comments on the manuscript.

Footnotes

References

View Abstract