## Abstract

Community structure has been widely identified as a feature of many real-world networks. It has been shown that the antigenic diversity of a pathogen population can be significantly affected by the contact network of its hosts; however, the effects of community structure have not yet been explored. Here, we examine the congruence between patterns of antigenic diversity in pathogen populations in neighbouring communities, using both a deterministic metapopulation model and individual-based formulations. We show that the spatial differentiation of the pathogen population can only be maintained at levels of coupling far lower than that necessary for the host populations to remain distinct. Therefore, identifiable community structure in host networks may not reflect differentiation of the processes occurring upon them and, conversely, a lack of genetic differentiation between pathogens from different host communities may not reflect strong mixing between them.

## 1. Introduction

Many important pathogens, including *Neisseria meningitidis*, the causal agent of bacterial meningitis, the influenza virus and the malaria parasite (*Plasmodiuin falciparum*), exhibit high levels of antigenic diversity. Theoretical models have been developed to try to understand the factors leading to the evolution and maintenance of antigenic diversity (Dietz 1979; Castillo-Chavez *et al*. 1989; Gupta *et al*. 1996, 1998; Andreasen *et al*. 1997; Gog & Swinton 2002; Gomes *et al*. 2002). Many such models assume that immunity to one strain will confer partial immunity to other similar strains. Models incorporating multiple loci show that high levels of cross-immunity can lead to the polarization of the pathogen population, with subsets of strains that do not share antigenic determinants stably dominating the population (Gupta *et al*. 1996, 1998). Within this framework, intermediate levels of cross-immunity can cause strains to oscillate in dominance, showing chaotic behaviour under certain conditions, whereas low cross-immunity allows all strains to coexist independently.

Understanding how contacts between neighbouring communities of hosts affect the dynamics of pathogens is critical for controlling the diseases they cause, since most human populations are structured into communities, from the scale of the family group to towns, cities and countries. Recent advances in the field of complex networks and, particularly, in relation to community structure, provide a natural framework to study these effects (Strogatz 2001; Albert & Barabási 2002; Newman 2003; Dorogovtsev *et al*. 2003). Models of the spread and evolution of infectious diseases on social networks have shown that local clustering of hosts can alter the transmission dynamics of an epidemic and change the evolutionary selection pressures on traits such as virulence and infectious period (van Baalen 2002; Eames & Keeling 2003; Read & Keeling 2003, 2006; Kiss *et al*. 2006). It has also been shown that in a highly structured spatial setting, strong cross-immunity can lead to local selection for different subsets of strains in neighbouring clusters of hosts, thereby increasing the overall diversity of the pathogen population (Buckee *et al*. 2004). None of these models has explored social networks with community structure, however. Here, we explicitly incorporate separate host communities with the aim of determining how varying levels of interaction between them affects the separation of antigenic profiles of pathogen populations.

Many epidemiological models have explored the effects of the strength of coupling between metapopulations on disease dynamics (Bartlett 1957; Rohani *et al*. 1999; Grenfell *et al*. 2001; Xia *et al*. 2004). These models explore transmission on different spatial scales (Bartlett 1957; Grenfell *et al*. 2001; Xia *et al*. 2004) and the effects of metapopulation structure on the eradication of disease through vaccination (Rohani *et al*. 1999). The effects of spatial separation on the genetic structure of natural populations have also been extensively studied by ecologists (reviewed in Harrison & Hastings 1996). Reduced gene flow between geographically or genetically separate communities can lead to genetic differentiation between subpopulations, and an increase in overall genetic diversity as a result of local differences in allelic frequencies (Levin 1988; Barton & Hewitt 1989; Mallet & Barton 1989; Kruuk *et al*. 1999; Molofsky *et al*. 1999). The majority of these ecological models focus on the balance between dispersal and selection required for two populations to remain genetically distinct.

Here, we develop a series of models that explore the effects of community structure on the maintenance of antigenic diversity within pathogen species. Like the ecological models discussed above, we investigate the balance between the separation of host communities and the level of immune selection necessary for the maintenance of antigenically distinct pathogen subpopulations. In the context of pathogen genetics, reduced dispersal between subpopulations in ecological models is analogous to the reduced transmission occurring between different communities of hosts. We show that distinct community structure in the host population is insufficient to maintain the separation between profiles of pathogen antigenic diversity.

## 2. Deterministic metapopulation model

First, a metapopulation version of the Gupta *et al*. (1996, 1998) model of strain dynamics was developed, incorporating two linked populations. Each community was defined by its own set of three differential equations, shown below, and linked by the coupling term, *α*, which was incorporated into the force of infection for each strain, *ic*. Strains were defined by their antigenic determinants, *i*, and their community, *c* (figure 1*a*). The equations, shown below, were set up in a two-locus two-allele framework, such that four strains were possible, each sharing alleles with two other strains.

Hosts exposed to one strain gained protection to all strains sharing alleles with that strain (the subset *i*′, including itself), depending on the level of cross-immunity, *γ*. In the absence of cross-immunity (*γ*=0), therefore, hosts exposed to one strain gained no protection to other strains with common alleles, while under complete cross-immunity (*γ*=1), they gained full protection to these strains. Note that cross-immunity in this case acted to reduce transmission rather than block infection.Here, *z*_{ic} is the proportion of the population immune to strain *ic*; *w*_{ic} is the proportion non-susceptible to any strain sharing alleles with strain *ic*; and *x*_{ic} is the proportion infected with strain *ic*. The coupling term was embedded within the force of infection, *λ*_{ic}, as follows:where *ic′* are strains from the other community with the same antigenic determinants and *β* is the transmission coefficient, which was the same for each strain. Since the strength of coupling covaried with *R*_{0} in this formulation of *λ*, we explored different forms of this coupling term. For example, we assessed the model output for the entire range of parameters assuming . However, the precise form appeared to make little difference to the outcome of the model, so the original formulation was used for simplicity. The parameters *σ* and *μ* are the rate at which hosts lose infectiousness and the death rate, respectively. The equations were examined numerically, and in order to identify the effects of coupling and explore the stability of neighbouring communities with different equilibria, the two communities were seeded with two discordant strains having non-overlapping antigenic repertoires. It was assumed that the duration of infectiousness (1/*σ*) was short when compared with the average lifespan (1/*μ*), and immunity was lifelong. Parameters were assigned such that the basic reproductive ratio *R*_{0} (here, *β*_{ic}/*σ*) or the average number of secondary cases resulting from a primary infection in a completely susceptible host population (Anderson & May 1991), was the same for every strain.

We observed from the results of this model that for lower levels of cross-immunity, all strains could coexist within both the communities at the same prevalence, so metapopulation structure had no identifiable effect. However, the effects of metapopulation structure of the system became evident at higher levels of cross-immunity. Two stable equilibria resulted, depending on the initial conditions:

For initial strain prevalences above an extremely low threshold (here 0.0001), the two communities maintained separate dynamics when they were weakly coupled. Here, different subsets of strains dominated each community, each subset being a pair of discordant strains (i.e. sharing no alleles), and this separation required higher levels of connectedness (

*α*) between communities as the level of cross-immunity (*γ*) increased. As cross-immunity declined, essentially lowering competition between strains for hosts, ‘synchronization’ of antigenic profiles (whether stable or oscillatory) occurred at much lower levels of connectedness. Figure 2*a*shows the relationship between the two parameters,*α*and*γ*, with the colouring indicating the difference in prevalence of the same antigenic strain in the two communities; the dark blue space shows where the two communities are behaving as a single population. For levels of cross-immunity resulting in oscillatory dynamics, the coupling between communities had a similar effect, with the communities maintaining asynchronous oscillations of discordant subsets of strains at very low coupling (*α*<0.015). This effect can be observed as a light striped area in figure 2*a*between*γ*=0.6 and 0.8. Again, at higher levels of coupling, the two communities showed identical oscillatory dynamics, shown in dark blue. Other studies of multiple strain systems have also shown that synchronized dynamics between strains are sensitive to biological parameters such as transmission rate (Kamo & Sasaki 2002; Rohani*et al*. 2003), and often become unstable when stochasticity is introduced.The second equilibrium occurred for extremely low initial prevalences of the seeded strains (dominant strains seeded at 0.0001 or less). Here, the relationship between cross-immunity and coupling was qualitatively the same as the previous equilibrium, but in this case, the two communities became homogeneous at even lower levels of

*α*(*α*=0.0014 when*γ*=1, see figure 2*a*; in an equivalent figure for the second equilibrium, only the*x*-axis would be altered). It would appear that this second equilibrium resulted from the inability of different dominant subsets to establish themselves in either community before the system equilibrated. As mentioned before, the maintenance of different discordant pairs of pathogen strains in the two communities required increasing levels of cross-immunity as the coupling between them strengthened.

Figure 2*b* shows examples of simulations of the deterministic model where cross-immunity was strong and coupling between communities was either above or below the threshold required for pathogen synchronization. Above, extremely low coupling between communities mean that the different subsets can be maintained at equilibrium in each community. Below, on the other hand, both communities are dominated by the same subset of strains at equilibrium because the separation between host communities is insufficient to prevent competition between the strains.

## 3. Stochastic models with community structure

As in the deterministic model, the pathogen strains were defined by two immunodominant genetic loci, each with two alleles, and by two communities, giving eight possible pathogen strains. Infected hosts could infect other hosts in the population depending on the transmission coefficient and the population structure, and this probability was the same for all strains. Upon infection, hosts remained infectious for a certain period of time before becoming immune to the infecting strain for a further period in an allele-specific manner. The duration of infectiousness and immunity were defined as the inverse of the probability that a host lost infectiousness or immunity at each time-step. Immunity to one pathogen strain meant that the host was partially immune to any pathogens with common alleles, dependent on the level of cross-immunity, *γ*. As in Buckee *et al*. (2004), cross-immunity was modelled assuming that a host's vulnerability (*v*) to infection depended on the extent to which the infecting strain shared antigenic determinants with strains in the host's immune memory (as in the deterministic framework). Here, the fraction of identical antigens *f* was converted into a vulnerability of infection between 0 and 1, such that *v*=(1−*f*^{1/γ})^{γ}, and *γ* was a positive number between 0 and 4 scaling the level of cross-immunity (see Buckee *et al*. (2004) for details). The probability of a host becoming infected with a strain was therefore a product of the host's vulnerability and the transmissibility of the strain. Co-infection of the same host by two different strains could lead to a recombination event between them, and the probability of recombination was equal for each locus. Hosts infected with two strains could therefore transmit a new strain that was a hybrid of these two ‘parent’ strains. Mutation, modelled as the random switching of alleles at each locus with a defined probability, was also included so that all possible strains could be generated following the seeding of the host population. Again, the two communities were seeded with different subsets of non-overlapping strains.

In the first stochastic model, the host population was divided into two communities each with *N*_{c}=256 hosts, as shown in figure 1*b*. In a single time-step, each host could infect other hosts within the same community with a probability *p*_{in}, assuming that the destination host did not already have any immunity to the infecting strain. In addition, a host in community *i* could infect hosts from community *c* with a probability *p*_{out} (figure 1*b*). For the communities to remain distinguishable in this case, *p*_{in} must be greater than *p*_{out}. This stochastic version was designed as an intermediate between the mean field model described above and the network model below, with *p*_{out} being equivalent to *α*, and will be referred to hereafter, as the random-mixing stochastic model. The motivation for this intermediate model was the observation that, in ecological literature, exotic species invading a new community (analogous to different subset of strains) have a higher probability of successful invasion once spatial clustering occurs around an ‘entry point’ (Korniss & Caraco 2005). We predicted that the static links between two hosts could create this kind of situation and change the qualitative outcome of the model.

In addition to the above, a network version with quenched disorder was implemented (figure 1*c*); networks were randomly generated, static for the course of each simulation and hosts were neither added nor removed. In contrast to the random-mixing stochastic case, here the contact structure was defined at the beginning of each simulation, and infectious hosts were only able to infect predefined neighbours. Hosts had on average *z*_{tot}=10 links, *z*_{in} of which were to hosts within the same community and *z*_{out} to hosts in the other community. To be able to compare directly with the above approach, *z*_{in} was set to *N*_{c}*p*_{in} and *z*_{out} to *N*_{c}*p*_{out}. Note that it is not necessarily the case that the host communities can be distinguished for *z*_{in}>*z*_{out}, since both are average values. In any particular realization, this number will fluctuate and a host may find itself better connected with the ‘wrong’ community. To determine how community structure within the host population was affected by values of *z*_{in} and *z*_{out}, we used the ‘fast’ algorithm for identifying communities within networks (Newman 2004) to evaluate our simulated populations.

To get an idea of how the algorithm works, consider a network in which each node has been assigned to a community (i.e. the network has been ‘partitioned’) in some arbitrary way. This partition has an associated value of modularity *Q*, defined in Newman & Girvan (2004) as the difference between the number of links between nodes within each community and the expected number of links for the same communities in a randomized network. Joining any two neighbouring partitions would produce a change in modularity, Δ*Q*, which will be large if they share many links and small if they share few links. Starting with each node in the network defined as its own community, we successively join pairs of communities that share many links (those that produce the highest Δ*Q*) until an optimum modularity is reached. We can use the algorithm to evaluate the level of community structure in our simulated populations by simply counting the number of nodes that are correctly identified as belonging to a community for each value of *z*_{out}. The result of this exercise is shown in figure 3, where communities in which *z*_{tot}=10 for each host remain very well defined for values of *z*_{out} up to 3.

## 4. Comparing deterministic and stochastic (random mixing and network) formulations

Both the random-mixing and network models showed similar dynamics to the deterministic systems for different levels of cross-immunity, with all strains coexisting when cross-immunity was low, oscillations occurring at moderate levels, and the domination of non-overlapping subsets of strains at high levels. Figure 4 shows an example of a network simulation in which the two communities begin with different dominant subsets, but make a sudden transition to both communities having the same dominant strains. The time taken to make this transition was significantly negatively correlated with the level of connectivity between the two communities (*R*^{2}=−0.362, *p*<0.00001), as might be expected, but not with the level of cross-immunity. Synchronization of antigenic profiles could also occur without either community previously establishing dominant subsets.

To compare the antigenic profiles in different communities within the stochastic frameworks, we used a population-level metric of diversity (Buckee *et al*. 2004) based on the Shannon–Weaver diversity index (Shannon & Weaver 1949). When cross-immunity was high, the within-community diversities were expected to be relatively low. If both the communities were dominated by the same subsets of strains, then the overall diversity would also be low. Conversely, when different dominant strains were maintained in each community, the overall population showed high diversity, since all strains existed at more or less the same prevalence overall. Thus, the diversity of individual communities could be compared with that of the overall population, with the difference between local and global diversities being analogous to the difference in prevalence of the same strains in different communities for the deterministic model.

Figure 5 shows the effects of cross-immunity and community separation on the pathogen population for the random-mixing (figure 5*a*) and the network (figure 5*b*) models. The figures are in the same format as figure 2, but use the difference between the diversity of the overall population and individual communities as a measure of the homogeneity of the pathogen population across the two communities. Although it is not straightforward to calibrate parameters between the different models, it appeared that the random-mixing and network models behaved equivalently to the deterministic model. Interestingly, the random-mixing version was slightly noisier than the network version. Both stochastic models showed similar qualitative outcomes to the first equilibrium of the deterministic model (figure 2*a*), with stronger cross-immunity allowing for separate pathogen dynamics in the two communities at higher levels of connectivity. The quantitative differences between *α* and *z*_{out} are approximately 10-fold in the transition from separate communities to one large population. There appeared to be no second equilibrium, however, and the prevalence of seeded strains had no impact on the outcome of the models.

Pathogen dynamics became synchronized between communities at values of *z*_{out} at which host communities in both stochastic models were still extremely well defined. Indeed, the host communities homogenized at a value of *z*_{out} an order of magnitude above that necessary for the pathogen populations to become identical. Figure 6 shows this key finding from the study; the synchronization of pathogen dynamics is shown in comparison with the homogenization of the host community networks. Note the logarithmic scale on the *x*-axis. For the random-mixing case, pathogens in the two communities showed identical structuring and dynamics within a few time-steps for values of *p*_{out}>0.001. Similarly, for the static network case, the pathogens in the two communities behaved essentially as one population after a short time for *z*_{out}>0.5 (equivalent to *p*_{out}>0.001 in the random-mixing model), whereas the communities of hosts remained highly differentiated until *z*_{out} rose above approximately 3. Thus, identifiable community structure in the host network can only lead to antigenic differences between their pathogen populations at extremely low values of *z*_{out}.

## 5. Discussion

Our main finding is that when there is strong cross-immunity, community structure can stably maintain diverse subsets of pathogen strains within neighbouring communities, but only if the connectivity between them is extremely low. Community structure therefore appears to have little effect upon the diversity of the pathogen population unless the communities are very well defined. Thus, a lack of antigenic differentiation between pathogen subpopulations in different host communities is not necessarily indicative of strong ties between them. Conversely, differences in antigenic profiles of pathogen populations may not reflect distinct community structure among hosts, but rather some other process such as the adaptation to varied local environments.

The relationship between cross-immunity and connectivity was qualitatively the same for all the three models, with the degree of cross-immunity required to maintain separate subsets of antigenic types in different host communities increasing with community connectivity. However, even when cross-immunity was strong, the overall pathogen population quickly became homogeneous with respect to the dominant strains at very low levels of connectivity. This occurred even when communities were initially dominated by different subsets of strains, and therefore could not be attributed to the lack of initial establishment of these strains within each community. The large discrepancy between the community connectivity required for host as compared to pathogen population homogenization generally suggests that the widespread identification of community structures in real social networks may not imply similar structuring of dynamic processes occurring upon them.

With the exception of very isolated communities, these models show that pathogen populations occurring within most communities of hosts are not likely to exhibit substantial differentiation in their antigenic repertoires. This may explain the global distribution of antigenic combinations observed for species such as *Neisseria meningitidis*, which have been shown to be structured into stable antigenic types by high levels of cross-immunity (Gupta *et al*. 1996; Urwin *et al*. 2004). In the context of our models, the stability of these antigenic types over wide areas does not necessarily imply high levels of mixing among host populations, but can be explained by strong immune selection even when links between neighbouring communities are very weak. It would be interesting to compare the distribution of antigen genes with that of the housekeeping genes of *N. meningitidis*, which do display geographical differentiation between different countries of Europe (Jolley *et al*. 2005). Our models suggest that genes that are under immune selection should not show the same level of differentiation.

## Acknowledgments

This research was supported by EPSRC (C.B.) and the Generalitat de Catalunya (L.D.).

## Footnotes

↵† These two authors contributed equally to this work.

- Received March 25, 2007.
- Accepted April 20, 2007.

- © 2007 The Royal Society