## Abstract

Serengeti lions frequently experience viral outbreaks. In 1994, one-third of Serengeti lions died from canine distemper virus (CDV). Based on the limited epidemiological data available from this period, it has been unclear whether the 1994 outbreak was propagated by lion-to-lion transmission alone or involved multiple introductions from other sympatric carnivore species. More broadly, we do not know whether contacts between lions allow any pathogen with a relatively short infectious period to *percolate* through the population (i.e. reach epidemic proportions). We built one of the most realistic contact network models for a wildlife population to date, based on detailed behavioural and movement data from a long-term lion study population. The model allowed us to identify previously unrecognized biases in the sparse data from the 1994 outbreak and develop methods for judiciously inferring disease dynamics from typical wildlife samples. Our analysis of the model in light of the 1994 outbreak data strongly suggest that, although lions are sufficiently well connected to sustain epidemics of CDV-like diseases, the 1994 epidemic was fuelled by multiple spillovers from other carnivore species, such as jackals and hyenas.

## 1. Introduction

Effective management of wildlife diseases depends on reliable information about transmission patterns, and, at the very least, knowing which species participate in transmission as maintenance and non-maintenance hosts (Cleaveland *et al.* 2007). Maintenance populations steadily maintain disease for long periods of time and can serve as disease reservoirs (,Haydon *et al.* 2002*a*). They typically exceed a critical community size in which a pathogen can persist indefinitely (,Bartlett 1960). Non-maintenance populations can experience transient outbreaks, which are either large epidemics that reach a significant fraction of hosts or small outbreaks that die out after only a few infections. There are two distinct classes of non-maintenance host populations: percolating populations can (but do not always) sustain large epidemics while non-percolating populations cannot (,Newman 2002; ,Meyers *et al.* 2005; ,Bansal *et al.* 2007; ,Davis *et al.* 2008). Whether or not a non-maintenance population can sustain an epidemic on its own depends, in part, on contact patterns among hosts. Populations with ample opportunities for pathogen transmission will lie above the *epidemic threshold* where large epidemics are possible, while more sparsely connected populations will lie below the epidemic threshold where outbreaks rapidly fizzle out.

Disease control strategies should prioritize maintenance hosts (Haydon *et al.* 2002*a*). However, for direct intervention in non-maintenance populations, it is critical to determine whether or not the population is percolating or non-percolating. If a non-percolating population experiences repeated introductions of diseases from sympatric populations, it may experience a series of small outbreaks that together take a large toll on the population. Multiple spillover outbreaks such as these may superficially resemble a single epidemic wave; however, the optimal control strategies for these two scenarios are quite different. In the spillover case, control measures should focus almost exclusively on preventing new introductions of disease, whereas in the epidemic case, strategies should also target transmission within the host population. Incorrectly targeting interventions can waste precious resources and cause harm to wildlife (e.g. culling of Asian civets for SARS (,Li *et al.* 2005) and UK badgers for bTB (,Donnelly *et al.* 2006)).

Mathematical models have historically provided important insights into disease dynamics and management (Anderson & May 1991; ,Ferguson *et al.* 2001; ,Haydon *et al.* 2002*b*; ,Keeling & Rohani 2008). Traditional disease models can, however, be misleading: mass-action models assume that populations are fully mixed, and lattice-based spatial models assume that all contacts are spatially proximate. Endangered species often live in groups and defend territories against conspecifics (e.g. lions in prides, wolves in packs), thus exhibiting population structure that is neither fully mixed nor geographically localized. Their populations show ‘community structure’ (,Cleaveland *et al.* 2008) in which the groups are highly intraconnected and more loosely interconnected based on complex movement and behavioural patterns. Epidemiological data corroborate that social groups are often the critical units for disease transmission in wildlife (,Altizer *et al.* 2003).

Contact network models allow us to explicitly consider the epidemiological consequences of complex patterns of host connectivity and have demonstrated that contact heterogeneity can fundamentally influence disease dynamics (Keeling 2005; ,Meyers *et al.* 2005; ,Bansal *et al.* 2006; ,Ferrari *et al.* 2006). However, network modelling often suffers from a paucity of good data on contact patterns, particularly for non-human hosts. Very few studies of free-ranging wildlife provide adequate empirical information to parametrize a network model (,Cross *et al.* 2005); but the long-term dataset of the Serengeti Lion Project (SLP), which includes decades of daily observations of behaviour and movement, is a unique exception (,Packer *et al.* 2005).

We used the SLP data to infer the contact network structure of an African lion (*Panthera leo*) population and built one of the most detailed, biologically realistic epidemiological network models of a wildlife population to date (but see Cross *et al.* (2005)). The model incorporates pride composition, movement of nomads (roaming lions) and contact rates between prides and nomads into a stochastic susceptible–exposed–infectious–recovered (SEIR) network framework. Disease-causing contacts between lions from different groups are assumed to include chases, fights, mating, close proximity and sequential and simultaneous feeding events. We then used this model to ask whether lions alone can sustain epidemics of contact-borne infectious diseases without repeated introductions from other species and, specifically, whether an observed 1994 canine distemper virus (CDV) epidemic could have been propagated exclusively by lion-to-lion transmission. The 1994 epidemic spread discontinuously throughout the study area, infected 17 of 18 study prides and took 35 weeks to spread across the entire ecosystem (,Roelke-Parker *et al.* 1996; ,Cleaveland *et al.* 2007; ,Craft *et al.* 2008). Lions, hyenas (*Crocuta crocuta*), bat-eared foxes (*Otocyon megalotis*) and domestic dogs (*Canis lupus familiaris*) were all infected with the same strain of CDV (Haas *et al.* 1996; ,Roelke-Parker *et al.* 1996; ,Carpenter *et al.* 1998), thus supporting the possibility of cross-species disease transmission. Some studies have argued that the lions experienced repeated introductions from other carnivore species and that multihost epidemics could produce a pattern of disease spread similar to the 1994 CDV outbreak (,Cleaveland *et al.* 2008; ,Craft *et al.* 2008). In contrast, ,Guiserix *et al.* (2007) claimed that, once CDV was introduced into the lion population, the lions probably sustained the outbreak themselves without subsequent transmission events from other species.

In addressing the plausibility of lion-to-lion transmission, we tackled larger issues about extrapolating disease dynamics from a geographically restricted study area (figure 1*a*) to a greater ecosystem. By taking samples from comparable areas or ‘subsets’ of our model ecosystems (figure 1*b*), we identified several unexpected discrepancies between sample data and ecosystem-wide disease dynamics, which are likely to arise in many wildlife disease field studies. In contrast to prior studies of the 1994 CDV outbreak (Guiserix *et al.* 2007), we analysed the field data in light of these discrepancies.

## 2. Material and methods

### (a) Modelling lion population structure

Lions live in gregarious groups (prides) composed of related females and their dependent offspring. Prides are territorial and infrequently contact their neighbours (Packer *et al.* 1992); inter-pride encounters can be deadly (,Schaller 1972; ,McComb *et al.* 1993; ,Grinnell *et al.* 1995). When prides grow too large, young females split off and form a neighbouring pride (,Pusey & Packer 1987) and are more tolerant of their non-pride relatives than lions from unrelated prides (,VanderWaal *et al.* in press). Coalitions of males can reside in more than one pride (,Bygott *et al.* 1979) and distribute their time between neighbouring prides (,Schaller 1972). By contrast, nomads do not maintain a territory and move throughout the ecosystem (,Schaller 1972). Lions from different social groups interact during territorial defence and at kills. Nomads can be seen as long distance disease dispersers, while shared males increase disease transmission between neighbouring prides. A quantitative summary of lion population structure is given in ,table 1 (,Craft 2008).

Our network model places *N*_{P}=180 prides and *N*_{N}=180 nomads at uniform random locations in a square region representing *A*=10 000 km^{2} of the high lion density area of the Serengeti (figure 1). The location of each pride is represented by a single point or *centroid* (geographical centre of its territory). Prides are assigned to be adjacent to one another according to the estimated adjacency model (*M*_{adj}), and these adjacencies form the edges of the territory network (example in figure 1*b*). A fraction (*Ψ*) of adjacent pairs are randomly assigned to have recently ‘split off’ from one another. Each pride is given a size (*X*_{P}) drawn from a best-fit gamma distribution. Contacts between prides occur at an average of *C*_{p}=4.55 contacts per two-week period per pride, as estimated from a study in which 16 lionesses were observed continuously for a total of 2213 hours (Packer *et al.* 1990). Contacts between pairs of prides occur stochastically at rates that are weighted by a logistic function of their territory distance and whether they recently split (*M*_{contact}).

Coalitions of resident males and nomads are treated separately from prides of females and cubs. Male coalitions are represented as single units that increase connectivity between prides. Each territorial coalition belongs to either one or two prides; an estimated fraction *η* of all prides share their territorial coalition with one of their adjacent prides, and each remaining pride has a territorial coalition to itself. If a territorial coalition is associated with two prides, it will switch between prides with probability , where *h* is a small time step and *ς* is the rate at which territorial males switch prides.

Nomadic lions are given group sizes (*X*_{N}) randomly generated from an estimated distribution and are assumed to migrate via a variance gamma process (*M*_{nomad}) (Madan *et al*. 1998; ,Glasserman 2004). Each group is initially assigned to the territory of a randomly selected pride, and at any point thereafter, resides in or around the territory of exactly one pride. In any small time step *h*, a group of nomads will migrate from the territory of its current pride (*i*) to that of another pride (*j*) with probability given bywhere *F*( ) is the cumulative distribution function for displacement over a two-week period; *d*_{ij} is the distance between the centroids of territories *i* and *j*; *α* is the average pride territory width ; and *c*_{i} is a normalizer. Nomads are assumed to contact their local pride at a uniform rate derived from the average rate of pride–nomad contacts per pride (*C*_{N}).

When a pride contacts another pride or nomadic group, only a subset of the pride is actually involved in the interaction (*G*), and the number of lions involved is drawn randomly from an estimated distribution that depends on the size of that pride. Specifically, the log of group size increases approximately linearly with pride size (table 1). When nomads contact prides, all members of the nomadic group are assumed to be present.

### (b) Modelling epidemiological dynamics

We model disease dynamics using a stochastic SEIR approach. Lions frequently contact all other lions in their nomadic group or pride (Packer *et al.* 1990), so we assume that any given pride or group of nomads moves through the four disease classes as a unit, as in a Levins-type patch model (,Levins 1969; ,Hanski & Gilpin 1997). A group is considered exposed when its first member becomes infected; the group transitions stochastically from exposed to infectious at a rate of 1 of 7 per day and from infectious to recovered at a rate of 1 of 14 per day based on published estimates for domestic dogs. (Sequential infection among pride members and longer latent and infectious periods would probably slow the spread of disease, but not change the total number of infections.)

When an infected group (*A*) contacts a susceptible group (*B*), the probability of disease transmission is a function of the number of individuals involved in the interaction and a per-contact transmissibility parameter (*T*), given bywhere *p*_{j} and *q*_{k} are the probabilities that the group sizes from *A* and *B* are *j* and *k*, respectively. This assumes that every lion in one group encounters every lion in the other group (recall that the expected size of a contact group is typically smaller than the size of the pride). When a susceptible coalition of territorial males resides with an infected pride, the coalition is immediately infected; and when an infected coalition of territorial males switches to a susceptible pride, it immediately infects the second pride.

Unless stated otherwise, the analysis is based on 200 simulated epidemics at 60 transmissibility values (*T*) between 0.0 and 0.3. For each run, a new lion population network was generated randomly, parameters were set to the values given in the *estimated quantities* column of table 1, and the first pride infected was chosen at random from either the subset or the population as a whole. We conducted a sensitivity analysis by running 200 replicate simulations at each of 50 transmissibility values using parameter values chosen randomly from the distributions given in the *distributions* column of table 1 (figure S2 in the electronic supplementary material). Statistical methods used for analysing centrality and network correlograms are described in the electronic supplementary material (text S1).

## 3. Results

We built an epidemiological network model, based on contact patterns within a lion population estimated from detailed SLP data (Craft 2008). The core of the model was a *territory network* in which prides were aggregated into single units (nodes), and edges were drawn between prides with adjacent territories, based on observed data. The *territory distance* between any two prides was then defined as the shortest path connecting their respective nodes. The prides contacted each other as a function of territory distance, and nomads migrated as a type of variance gamma process, contacting prides in their vicinity according to empirical estimates. In our stochastic SEIR simulations of CDV transmissions through the lion network, we monitored disease spread for the entire population and within a geographically restricted subset of 18 prides (figure 1*b*) resembling the study population (figure 1*a*).

### (a) Edge effects

We use two network quantities to characterize the location of a pride in the overall network. The *degree* of a pride is the number of directly adjacent neighbouring prides; and the *closeness centrality* of a pride is the reciprocal of the pride's average minimum path length to all other prides in the network, which intuitively correlates with the likelihood that disease will reach the pride from elsewhere in the ecosystem. In our model, the subset prides were biased towards the physical and network boundaries of the ecosystem, having lower average distance to the ecosystem boundary, degree and closeness centrality than the population as a whole (figure 2, horizontal box plots).

We investigated the relationship between these metrics and the probability that a pride (i) will become infected during an epidemic and (ii) can spark a large-scale epidemic in an immunologically naive population (figure 2, dotted and solid lines, respectively). Both of these epidemiological risks increase with distance to edge, degree and centrality of a pride. To compare the relative importance of these factors on epidemiological risk, we performed a multivariate logistic regression, which indicated that that degree and closeness centrality account for the variation in the probability that a pride becomes infected (*p*<0.001; table S1 in the electronic supplementary material). In other words, the network structure may be the reason that distance to edge correlates with the probability of infection. These patterns explain the lower disease burden in the subset when compared with the overall population (figure 3*b*).

### (b) Small sample size

During the 1994 CDV epidemic, 17 of the 18 prides (94%) in the 2000 km^{2} SLP study area became infected. Based on the edge effect, we initially assumed that the overall prevalence in the ecosystem should have been greater than or equal to this value. Instead, the model subset was more likely to experience an outbreak with greater than or equal to 94 per cent of prides infected than the overall population (figure 3*c*). This discrepancy has a simple combinatoric explanation; with only 18 prides monitored, observed prevalence levels could only take on a few discrete values (i.e. 17 prides, 94%; 18 prides, 100%). Consider a simple model in which (i) prides infected during an epidemic are randomly distributed throughout the ecosystem and (ii) subsets are random samples of 18 prides from the set of 180 prides. Then, subset prevalences should follow a hypergeometric distribution with parameters *N*=180, *m*=number infected prides overall and *n*=18. This null model closely predicts the observed differences (figure 3*c*, blue line), even though it ignores spatial clustering of disease and the contiguity of prides in the subset.

### (c) Spatial scale

In the model, CDV epidemics typically spread wavelike across the ecosystem. Specifically, the shorter the distance between prides in the territory network, the higher the correlation between their times of infection (figure S1a in the electronic supplementary material). The wavelike pattern is more pronounced when measured by network distance rather than geographical distance (not shown). When viewed through the narrow lens of the subset, however, there is a lower correlation for directly adjacent prides and almost no correlation among more distant prides (figure S1a in the electronic supplementary material).

To compare correlograms across transmissibility values, we calculated correlations for directly adjacent prides (a network distance of one) and the slope of the correlogram for network distances between one and three (figure S1*b*,*c* in the electronic supplementary material). For outbreaks that originated in the subset (as observed in the 1994 epidemic), correlations between adjacent prides increased with transmissibility, but correlations were lower in the subset than across the entire population. The rate at which the correlations declined with network distance was similar in the subset and population and relatively uniform across all transmissibility values. Thus, figure S1a in the electronic supplementary material (which is based on *T*=0.1725) is representative of the spatio-temporal patterns observed across the entire range of transmissibilities, with little apparent correlation in the subset despite a wavelike spread overall.

When we plotted distance from the first infected pride (*pride zero*) against the time of infection during a typical simulation (figure 4*a*), we observed relatively continuous expansion overall, but a discontinuous pattern within the subset fuelled by repeated introduction from elsewhere. For outbreaks initiated within the subset that infected at least 17 of the 18 prides, the probability of at least one reintroduction was 0.970 (s.d.=0.093); and the average number of subset prides with greater than or equal to 75 per cent chance of infection from outside the subset was 1.96 (s.d.=1.73). Thus, the spatial pattern of infections within the subset generally appeared patchy in the midst of a wavelike epidemic.

### (d) Model versus data: did lions sustain the 1994 outbreak themselves?

We compared the predictions of our model with three empirical observations: the discontinuous spatial spread within the study area; 94 per cent prevalence within the study area; and the slow spread of the outbreak across the entire ecosystem. We also performed a full sensitivity analysis and found that the quantitative results were largely insensitive to uncertainty in the parameter values (table 1; figure S2 in the electronic supplementary material).

The model produced spatial patterns within the subset that were similar to the 1994 outbreak (figure 4; figure S3 and Video S1 in the electronic supplementary material). Disease appeared in clusters separated from each other in time and space. Across the entire range of transmissibility values, there is a 10–20 per cent chance that epidemics will appear at least as discontinuous as observed in 1994 (,figure 4*b*). These probabilities are highest for low values of transmissibility, where transmission between neighbouring prides is rare, and thus the time of infection for adjacent prides is relatively uncorrelated. The model also predicts outbreaks with the observed pride prevalence, especially at higher transmissibility values (figure 3*a*), and predicts the observed rate of geographical spread at lower transmissibility values (figure S4 in the electronic supplementary material).

Although each of these individual patterns has a reasonable probability of occurring in a lion-to-lion epidemic, it is highly unlikely that all three could occur simultaneously (figure 5). The observed spatial spread and velocity are most likely to occur at low transmissibilities while the observed prevalence is most likely at higher transmissibilities. Only a minute fraction of simulations exhibited both the observed prevalence and velocity. The highest probability of observing both patterns is 0.02, occurring around *T*=0.095. We did not include the spatial analysis (figure 4; figure S1 in the electronic supplementary material) in this comparison because patchy outbreaks correlate with low velocity, and adding a spatial criterion would only reduce the joint probability further.

Since the model failed to identify a range of transmission values that could have plausibly produced an epidemic that was both as large and as slow as the observed 1994 outbreak, we conclude that the assumption of strict lion-to-lion transmission must be incorrect. Thus, the actual transmission dynamics probably involved multiple introductions of disease to the lions from sympatric carnivore species.

## 4. Discussion

### (a) Are Serengeti lions a percolating population for canine distemper virus?

Serengeti lions probably experience outbreaks of CDV and other directly transmitted viral diseases with similar infectious periods, such as feline calicivirus and parvovirus, every 4–12 years (Packer *et al.* 1999). Our model suggests that this population of lions is sufficiently well connected to sustain epidemics of CDV-like diseases on their own, i.e. it is a percolating population for viruses with short infectious periods. Even moderately contagious diseases (with probability of transmission per contact *T* ≈ 0.13) have at least a 5 per cent chance of producing an epidemic that reaches 95 per cent of all prides in the ecosystem (figure 3*c*); and this probability increases rapidly with transmissibility. If CDV is at least moderately infectious in lions, as suggested for domestic and wild carnivores (Appel 1987), then our model suggests that it has the potential to sweep through the entire population.

The 1994 CDV outbreak, however, was unlikely to have been maintained by lions alone. Across the entire range of transmissibility values, a strictly lion-to-lion epidemic could not have been both as extensive and as slow moving as observed in 1994. At low rates of transmissibility, disease can spread as slowly as in 1994 but not reach the observed prevalence; the reverse is true at high rates of transmissibility (figure 5).

The most plausible explanation for this discrepancy is the absence of additional carnivore species from our model. Lions commonly contact hyenas and jackals during simultaneous or sequential feeding events (Cleaveland *et al.* 2008), and a single CDV variant was found to be circulating in lions, hyenas, bat-eared foxes and domestic dogs during the 1994 outbreak (,Haas *et al.* 1996; ,Roelke-Parker *et al.* 1996; ,Carpenter *et al.* 1998). Thus, there were repeated opportunities for CDV to be introduced into the lion population. Although this conclusion contradicts a recent analysis by ,Guiserix *et al.* (2007), it is consistent with the genetic analysis and supported by observations of sick jackals at the time of the epidemic (,Roelke-Parker *et al.* 1996). Our model suggests that lions were a ‘non-percolating’ population for this CDV epidemic and experienced transient chains of infection that ‘spilled over’ from other species.

### (b) Do disease dynamics scale?

Wildlife studies can be resource and time intensive; thus, biologists regularly extrapolate from subsets of larger populations. Ecologists recognize that natural processes can vary considerably with the spatial scale of the observation (Tilman & Kareiva 1997; ,O'Neil & King 1998) and thus use multiscale approaches to analyse complex ecological systems. Given the difficulty of observing wildlife disease outbreaks in real time, disease ecologists are typically forced to mine sparse data without regard to sampling or scaling issues (examples include ,Williams *et al.* 1988; ,Woodroffe *et al.* 1997; ,Packer *et al.* 1999; ,Leendertz *et al.* 2004; ,Haydon *et al.* 2006).

In this study, we identified three potential sources of error that are relevant to wildlife disease ecology. The first is an edge effect, or more generally, non-random sampling with respect to the epidemiological structure of the population. Directly transmitted diseases spread primarily during contacts between neighbours or neighbouring groups; and the pattern of such interactions gives rise to a contact network. The position of a group within the network, in conjunction with the overall network structure, determines its epidemiological risk (figure 2). The contact network for Serengeti lions is highly spatial, such that contact rates are highly correlated with the number of nearby prides. Thus, prides located closest to the border of the Serengeti National Park have the fewest contacts, on average. For this reason, estimates based on samples taken from the outskirts of the park (such as the SLP study area) would tend to underestimate the overall burden of disease in the Serengeti ecosystem. Note, however, that samples from a geographical boundary will not suffer from an edge effect if the population is sufficiently well mixed that contact rates are homogeneous throughout the ecosystem.

The frequency of an epidemic in the subset can also differ significantly from the overall population, simply because of variability associated with taking a small random sample from a large population. Just by chance, the sample proportion can deviate considerably from the population proportion; the sample proportion is limited to a discrete number of values (i.e. 17/18, 18/18…). In the 1994 CDV epidemic, 94 per cent of prides in the subset were infected. At relatively low transmissibilities (*T*∼0.1), almost no simulated epidemics reach an overall prevalence of 94 per cent, yet a sizeable fraction infect at least 94 per cent of subset prides. Thus, at moderate transmissibilities, where few, if any, epidemics cross the 94 per cent threshold, sampling variability alone can explain the higher vulnerability of the subset to large epidemics than the overall population.

The final complication arises when sampling from a smaller geographical scale than that of disease transmission. The SLP data from the 1994 CDV outbreak suggest non-wavelike, erratic spread of disease throughout the study area, which has been seen as evidence for repeated introduction from other species (Craft *et al.* 2008). Although we ultimately rejected the possibility that lions sustained the 1994 outbreak by themselves, it would have been incorrect to assume that the observed spatial spread necessarily implied a similar pattern across the entire ecosystem. While contacts primarily occur between neighbouring groups, lion prides occasionally contact distant prides and migrating nomads, which reduces the correlation between the distance and the timing of infection. When the probability of transmission is low, disease may initially reach only a few prides in a given area and later return to the same vicinity via longer distance contacts. In a population with exclusively local contacts, dynamics at a small scale will become much more wavelike and more closely resemble the large-scale dynamics. On the other hand, completely mixed populations will lack scale dependencies, because they lack spatial patterns altogether. This study demonstrates that wildlife populations may not fulfil assumptions of classical epidemiological models, such as the lattice or mass-action models, and an understanding of both network structure and sampling caveats should be considered when constructing disease models for wildlife populations.

## Acknowledgments

The authors thank M. Anderson, B. Kissui, A. Mosser, A. Sorensina and C. Souther for raw data or help with data extraction. We thank O. Bjornstad for assistance with the ncf-package, and A. Dobson, S. Cleaveland, K. Hampson, D. Haydon, M. Kaare, T. Lembo and E. Ernest for discussion about carnivore disease in the Serengeti. We thank the Santa Fe Institute for providing a working visit for M.E.C. and L.A.M. This research was supported by NSF grants (DEB-0225453, DEB-0343960, DEB-079097, DEB-0749097, BE-0308486, EF-0225453 and DEB-0710070) with additional funding from Lincoln Park Zoo, Sigma Xi, the U of MN's Graduate School and EEB Department, and a grant from the James F. McDonnell Foundation to L.A.M.

## Footnotes

- Received November 10, 2008.
- Accepted January 22, 2009.

- © 2009 The Royal Society