## Abstract

Modelling the spatial spread of vector-borne zoonotic pathogens maintained in enzootic transmission cycles remains a major challenge. The best available spatio-temporal data on pathogen spread often take the form of human disease surveillance data. By applying a classic ecological approach—occupancy modelling—to an epidemiological question of disease spread, we used surveillance data to examine the latent ecological invasion of tick-borne pathogens. Over the last half-century, previously undescribed tick-borne pathogens including the agents of Lyme disease and human babesiosis have rapidly spread across the northeast United States. Despite their epidemiological importance, the mechanisms of tick-borne pathogen invasion and drivers underlying the distinct invasion trajectories of the co-vectored pathogens remain unresolved. Our approach allowed us to estimate the unobserved ecological processes underlying pathogen spread while accounting for imperfect detection of human cases. Our model predicts that tick-borne diseases spread in a diffusion-like manner with occasional long-distance dispersal and that babesiosis spread exhibits strong dependence on Lyme disease.

## 1. Introduction

Tick-borne diseases (TBDs) are responsible for over 300 000 human cases per year in the USA alone [1] and are globally emerging owing to the expanding ranges of tick vectors, reservoir hosts and pathogens [2–4]. Understanding the ecological process of TBD invasion is fundamental for developing effective surveillance systems and designing measures to limit further geographical spread. However, quantifying the spatial spread of zoonotic TBD maintained in enzootic transmission cycles remains a major challenge [5] because longitudinal data from sampling vectors and wildlife host species at wide spatial scales are not currently available. For this reason, human surveillance data offer us the best spatially explicit time series available. However, these data represent only the observed epidemiological manifestation of a complex constellation of processes: introduction and establishment of tick-borne pathogens in enzootic transmission cycles (generating entomological risk), human exposure to infected tick vectors, human infection, progression from infection to disease, case diagnosis and case reporting.

Occupancy and metapopulation models, commonly used in population ecology, can be applied to model the spread of TBD. Occupancy models explicitly consider the observation process (in this case, diagnosing and reporting human cases) as contingent on the underlying distribution of a species (in this case, a tick-borne pathogen) and have been used to examine the prevalence of imperfectly detected human and wildlife pathogens and disease vectors [6–10]. However, their use in modelling the spatio-temporal distribution of invading pathogens is relatively new [11,12] and to the best of our knowledge, they have not been applied to explore the processes underlying spatial spread of disease. Metapopulation models consider the dynamics of spatially separated but interacting populations (such as the patchy distribution of a tick-borne pathogen) [13]. Metapopulation models have been used to model infectious disease dynamics [13–16], but have not been applied within an occupancy framework to dissect the processes underlying spatial disease spread.

Lyme disease and babesiosis are recently emerging in North America: Lyme disease was first described in Lyme, Connecticut in 1976 and babesiosis on Nantucket Island, Massachusetts in 1969 [17,18]. Both the Lyme disease bacteria, *Borrelia burgdorferi*, and the babesiosis parasite, *Babesia microti,* share the same tick vector (*Ixodes scapularis*) and an overlapping community of vertebrate reservoir hosts. Despite their similarities, Lyme disease has rapidly spread through most of New England and the Midwest, and over 30 000 cases of Lyme disease are reported each year in the USA, although the true burden of disease is estimated to be 10 times greater [1,4]. By contrast, babesiosis expansion appears to follow that of Lyme disease and fewer than 2000 babesiosis cases are reported yearly [19–21].

How do the two co-vectored pathogens move across the landscape and what explains their markedly different invasion trajectories? The spread of tick-borne pathogens and diseases coarsely followed the spread of *I. scapularis* ticks out of southern New England; however, little is known about the latent processes underlying rapid emergence [20,22]. Apparent spread of reported disease may reflect the underlying ecological invasion, epidemiological (exposure) and/or observation processes (human case diagnosis and surveillance).

*Ixodes scapularis* only move a few metres during each life stage [23,24]; thus, tick-borne pathogen dispersal is dependent on movement of mammalian and avian hosts, which differ greatly both in their dispersal capacity, proportion of the tick population fed and reservoir competence for both *B. burgdorferi* and *B. microti* [19,25–27]*.* It remains unclear whether the risk of TBD propagates locally, generating a rabies disease-like invasion front [11,16] or if TBD risk is driven by long-distance dispersal as occurs for West Nile Virus fever [28]. Although several studies have identified possible ecological correlates of TBD prevalence in humans, including density of infected ticks [29], and correlates of the distribution of infected ticks, including proximity to water bodies [30] and, for babesiosis, prevalence of Lyme disease [31], the ecological factors driving *spread* of reported TBD at the scale of the northeast United States remain unresolved. Spread differences between pathogens may reflect biological differences or may be in part an artefact of the better-defined symptomatology and/or heightened awareness of Lyme disease compared with babesiosis [32–34].

We use an occupancy model to test the following ecological and epidemiological hypotheses about the processes of TBD spread and the observed differences in Lyme disease and babesiosis emergence.

*Processes underlying spread*.

(i) Both local spread and long-distance dispersal contribute to the spread of TBD;

(ii) high tick density facilitates introductions of TBD; and

(iii) proximity to water (a major river or the Atlantic coastline) facilitates spread of TBD.

*Contrasting spread trajectories*.

(iv) Lyme disease presence facilitates the colonization and persistence of babesiosis; and

(v) Lyme disease is more likely to be reported than babesiosis.

Our modelling approach allows us to quantify the spatial dynamics of pathogen invasion and provides insights into the future distribution of Lyme disease and babesiosis in the northeast United States.

## 2. Material and methods

### (a) Data

As Lyme disease and babesiosis are nationally notifiable diseases, states are required to collect and report surveillance data. We compiled human case data available at the town level from the Connecticut, Massachusetts, New Hampshire and Rhode Island state health departments from 1984 to 2014 for Lyme disease and from 1985 to 2014 for babesiosis (figure 1 and electronic supplementary material, figure S1 and table S1). Data from Vermont and Maine were not available at the town level. We included both probable and confirmed cases and excluded imported cases. Reporting effort varied across states and over time [35,36] (electronic supplementary material, table S1). Here, we are primarily interested in the invasion process and thus modelled spatio-temporal occupancy dynamics rather than the abundance of cases. The disease status of a town was defined as either 0 (no cases reported) or 1 (at least one case reported) based on the yearly surveillance data as in a typical Levins-type metapopulation model, which does not consider the internal dynamics of a subpopulation [13,16].

Density data for infected *I. scapularis* were derived from a previously published field- and climate-based model for density of infected nymphal *I. scapularis* ticks based on sampling from 2004 to 2006 [37]. Infected tick density is thus a static spatial predictor; however, habitat suitability for *I. scapularis* and thus infected *I. scapularis* density is dynamic [38] and exhibits spatio-temporal variation not captured by this covariate.

We determined the minimum Cartesian distance from the town geographical centroid to one of the three major rivers in our study area—the Hudson, Connecticut and Merrimack Rivers—or the Atlantic coastline in R v. 3.1.1 [39] as a measure of proximity to a major water body.

### (b) Model

To test hypotheses regarding the patterns and drivers of TBD spread, we fitted an occupancy model including observation (disease reporting) and ecological processes. This allowed us to quantify the contribution of latent processes to disease spread and test the significance of covariates to each spread process independently. We compare this ecological process model with a null model that assumes the probability of at least one reported human case in a town depends only on its disease status and that of its adjacent neighbours in the previous time step (electronic supplementary material, text S1, see neighbourhood definitions, below).

Town-level case reports are available for each of towns. Let be the disease status of town *i* at time *t* (a random variable). The disease status of town *i* at time *t* () arises from an underlying Bernoulli process with the probability of at least one reported human case occurring in town *i* at time *t* , where depends on the infection status of town *i* and its neighbours, defined below, in the previous year .

#### (i) Observation process

TBDs are substantially under-reported [1,35], and heterogeneities in reporting across space and time are difficult to quantify. Our data include *reported* cases; therefore, we modelled as the probability of at least one *reported* human case occurring in town *i* at time *t*. Data on *B. burgdorferi* or *B. microti* incidence (e.g. information from serosurveys) are not available at regional spatial scales and therefore, we cannot estimate incidence rates of symptomatic and asymptomatic disease nor reporting rates for the two diseases. Reporting occurs at the state-level and states have differing case definitions, reporting efforts and surveillance infrastructure. Therefore, we introduced a state reporting effect , which captures state-level variation in surveillance. More specifically, we defined state reporting effect as the probability cases in any town in state *s* will be documented by the public health department of their state (constant probability across towns) given the modelled probability of case occurrence :
2.1While surveillance probably varies across towns within the same state, available data do not include site-time replicates; therefore, a model including town-varying reporting effects is not identifiable [11,40]. To test the hypothesis that the Lyme disease surveillance is more intensive than babesiosis (hypothesis (v)), we compared estimated for each disease.

#### (ii) Ecological process

We adapted a previously described occupancy model [41] to model the latent ecological processes underlying TBD spread—initial colonization of a previously unoccupied area, local persistence or extinction, and potentially, recolonization. (Parameters are defined in the electronic supplementary material, table S2.)

(i)

*Initial colonization*. If town*i*was not diseased at the previous time step and the town has never before reported disease , then the town is available for initial colonization with probability .(ii)

*Persistence*. If town*i*was diseased at the previous time step , then disease may persist locally with probability . (The probability of local extinction is )(iii)

*Re-colonization*. If town*i*was not diseased at the previous time step , but the town previously reported disease at some point during the surveillance period , then the town can be re-colonized with probability . This allowed us to distinguish between primary and subsequent introductions.

The indicator variable represents local disease history and the availability of town *i* for initial disease introduction. If a town has never reported disease, it is available to be colonized by disease, . If the town has previously reported disease in any year, :
2.2The probability of reported disease in town *i* at time *t* is the sum of the three processes:
2.3

We included town population size as a covariate at the highest level of the model because it may influence both the observation process (owing to increased local reporting effort in larger towns) as well as the latent process of disease spread, as larger towns have more susceptible hosts who may be infected and diseased.

Because both Lyme and babesiosis cases may have occurred before reporting commenced, there is a non-zero probability of human cases occurring for town *i* in the year case reports were initiated, . We treated the latent probability of reported cases occurring during this first year of reporting in town *i* as a random variable:
2.4

#### (iii) Spatial structure

Disease may persist or colonize each town owing to spread from neighbouring towns, probably owing to pathogen movement via dispersal of small mammals or birds [42–44]. We explored a variety of models for how a town's risk for initial colonization, persistence or re-colonization may depend on neighbouring towns' disease status (hypothesis (i)). To determine an appropriate definition for the spatial neighbourhood (the spatial extent of towns that exert disease pressure on the focal town), we fitted a series of models with different measures of neighbourhood disease intensity and evaluated model fit with Δ deviance information criterion (ΔDIC) compared with the non-spatial process-based model (electronic supplementary material, table S3) [45].

We model each ecological process (initial colonization, persistence and re-colonization) as a function of spatially dependent disease spread in addition to spread independent of spatial context. For example, following previously described notation [41], initial colonization is modelled as 2.5Here, the logit-scaled probability of initial colonization is a function of the spatial neighbourhood; the coefficients and represent first- and second-order dependence of initial colonization on neighbourhood disease intensity . Evidence of a second-order relationship between colonization and neighbourhood disease intensity could, for example, represent a saturating effect of increased pathogen pressure from nearby towns [41]. Alternatively, a quadratic dependence on neighbourhood disease intensity may reveal a surveillance effect analogous to an Allee effect [46]. For instance, colonization probability may have a positive relationship with at low levels, but in towns/years with high , population awareness may be high, leading to changes in human behaviours and thus exposure patterns, potentially resulting in a decreased probability of colonization of reported disease. The intercept, represents an additional colonization pressure, independent of spatial neighbourhood.

#### (iv) Ecological covariates

To test the hypotheses about drivers of tick-borne pathogen spread outlined in the Introduction, we included covariates at the process level (initial colonization, persistence or re-colonization) (equations (2.10)–(2.12)).

To test if high tick density facilitates disease spread (hypothesis (ii)), we included density of infected nymphal ticks, , as a covariate; the coefficients represent the effect of tick density on initial colonization, persistence and re-colonization, respectively. To test if proximity to water facilitates movement of tick-borne pathogens (hypothesis (iii)), we included distance to a major water body, , as a covariate; the coefficients represent the effect of proximity to water on initial colonization, persistence and re-colonization, respectively. To test if Lyme disease facilitates spread of babesiosis (hypothesis (iv)), we included Lyme endemicity, , defined as the number of years a town has reported cases of Lyme disease prior to year *t*, as a predictor for babesiosis status; the coefficients represent the effect of Lyme endemicity on initial colonization, persistence and re-colonization, respectively.

The equations describing the saturated model for the presence of disease reports in reported disease in town *i* at time *t* () are as follows:
2.6

probability of reported disease (): 2.7

probability of initial colonization (): 2.8

probability of persistence (): 2.9

probability of re-colonization (): 2.10

#### (v) Model implementation and validation

We fitted the models using a Markov chain Monte Carlo (MCMC) algorithm with Gibbs sampling to simulate sequences of dependent samples from the posterior distribution of model parameters, implemented in JAGS v. 3.4.0 [47] (model code in electronic supplementary material, text S2). All priors were non-informative (electronic supplementary material, table S2). We ran three independent chains, and inference was based on 100 000 samples after discarding a burn-in of 10 000 iterations and thinning chains every 10 draws. MCMC convergence was assessed with the Brooks–Gelman–Rubin convergence diagnostic [48].

Our primary objectives were to test evidence in the data for our *a priori* hypotheses and model-based prediction. Thus, we generated the model set described above and used model selection to determine support in the data for the model/hypothesis. We used the DIC to evaluate whether increasing model complexity decreased model deviance while penalizing for overfitting (electronic supplementary material, table S4). Parameter estimates from a model that includes non-significant parameters (i.e. the full covariate model) should not be biased but may introduce excess noise to predictions [49]. Accordingly, in the Results and Discussion, we present parameter estimates from what we call the parsimonious model—the best-fit model under DIC stripped of covariates whose parameter estimate credible intervals overlap 0. Parameter estimates change little between the full covariate and parsimonious models (compare figure 2 with electronic supplementary material, figure S2 and figure 3 with electronic supplementary material, figure S3), confirming that bias is not introduced by exclusion of non-significant parameters.

We validate the parsimonious models in two ways. First, we evaluate one-step-ahead predictions (i.e. model predictions conditional on data observed for the previous year with area under the curve (AUC), determined using the R package ‘pROC’) [50]. Second, to test forecasting capacity, we partitioned the data into a ‘training’ period including case reports through 2011 used to fit the model parameters, and a ‘validation’ period including data from 2012 to 2014. We calculated neighbourhood disease intensity for each year as defined in equation (2.5), so that forecasts in 2013 and 2014 were a function of *forecasted* neighbourhood disease status in the previous year, not reported status. Out-of-fit forecasts were assessed with AUC.

To convert continuous predicted probabilities of disease into binary disease forecasts, we identified a probability threshold that minimizes the difference between model sensitivity and specificity [51].

## 3. Results

### (a) Lyme disease

The spread of Lyme disease exhibits strong spatial dependence and is associated with *I. scapularis* density, as evidenced by the best-fitting Lyme disease model (electronic supplementary material, table S4). One-step-ahead model predictions of disease had a mean AUC of 0.93 (AUC > 0.7 is generally considered a good fit) [52]. When reporting began in Connecticut in 1984, Lyme disease cases were reported only in eastern coastal Connecticut. The model predicts the diffusion-like spread of cases from neighbours northwards along two seemingly independent corridors: along the Atlantic coast through Rhode Island, eastern Massachusetts and into southeastern New Hampshire; and along the New York state border, with an apparent lag further inland (figure 1). By 2014, the model predicts most New England towns report Lyme disease cases, with the exception of a corridor of uninfected towns in west central Massachusetts and northwestern New Hampshire.

The occupancy model distinguishes the three processes contributing to the spread of reported disease and the contribution of spatial and environmental covariates (figures 2*a* and 3*a*, electronic supplementary material, table S3). We find a significant spatial effect for each disease process, evidence that disease risk propagates from neighbours (figure 2*a* and electronic supplementary material, table S3). The spatial neighbourhood which best predicts spread of disease is an inverse distance weighted sum of the disease status of *all* towns in the study area rather than only adjacent towns, evidence that both local spread (from adjacent towns) as well as spread at longer distances contribute to movement of Lyme disease (electronic supplementary material, table S3). As neighbourhood disease intensity increases, the probability of each invasion process increases (figure 2*a*). We estimated a mean dispersal velocity for Lyme disease of 11.4 km per year by fitting a connectivity function (though this model of spatial neighbourhood did not provide the best fit to the data, electronic supplementary material, table S5).

Lyme disease persistence and re-colonization but not initial colonization are positively associated with tick habitat suitability (figure 3*a*). Lyme disease spread is not associated with proximity to a water body. State reporting effects were high (greater than 0.95) across all states, which suggests a lack of variability in reporting between states (electronic supplementary material, figure S4).

### (b) Babesiosis

The spread of babesiosis exhibits strong spatial dependence in addition to dependence on Lyme disease endemicity, tick density and proximity to water, as evidenced by the best-fitting babesiosis model (electronic supplementary material, table S4). One-step-ahead predictions had a mean AUC of 0.92. The babesiosis model predicts the slow spread of babesiosis north through eastern coastal Connecticut and into a few isolated towns on Cape Cod and Nantucket Island (south of Cape Cod) in the 1990s (figure 1). During the 2000s, the model predicts continuous, slow spread of babesiosis northeast along the Atlantic coast. Much of Connecticut, eastern Massachusetts and southern coastal New Hampshire have a moderate probability of reporting cases by 2010.

Again, the occupancy model distinguishes the three processes underlying spread of babesiosis. We find a significant spatial effect for each disease process, evidence that babesiosis risk propagates from neighbours (figure 2*b* and electronic supplementary material, table S3). Similar to Lyme disease, the best-fitting definition of spatial neighbourhood indicates that spread is best predicted by an inverse distance weighted average of the reporting status of all towns in the study area in the previous year, suggesting the disease pressure comes from *all* towns in New England but particularly nearby towns (electronic supplementary material, table S3). As neighbourhood disease intensity increases, the probability of each invasion process increases (figure 2*b*). We estimated a mean dispersal velocity for babesiosis of 10.1 km year^{−1} by fitting a connectivity function (electronic supplementary material, table S5), although again this was not the best-fitting spatial model.

Babesiosis spread exhibits strong dependence on Lyme disease; initial colonization, persistence and re-colonization of babesiosis are each positively associated with a history of Lyme disease endemicity (figure 3*b*), and no towns report babesiosis without a prior history of Lyme disease reporting. A hypothetical scenario of babesiosis invasion in the absence of Lyme disease was modelled by setting Lyme history to 0 for all towns. Babesiosis spread is suppressed in such a scenario (figure 2*b*, dotted lines). Babesiosis persistence and initial colonization are negatively associated with distance to a major water body (figure 3*b*). Babesiosis persistence is additionally positively associated with tick habitat suitability (figure 3*a*). State reporting effect varied across states (electronic supplementary material, figure S4).

### (c) Comparison of Lyme and babesiosis trajectories

When holding all covariates at their mean values, Lyme disease has significantly higher rates of initial colonization and re-colonization compared with babesiosis across all neighbourhood disease intensities (figure 2). Independent of spatial neighbourhood, the probability of Lyme disease initial colonization is predicted to be 10 times that of babesiosis (figure 2*a,b* yellow line intercepts).

### (d) Model forecasting

To validate the parsimonious models, we re-fitted the model to case reports through 2011 in order to forecast disease in each of 3 years between 2012 and 2014 [53].

Out-of-fit Lyme disease model forecasts have a mean AUC of 0.93. We identify a probability threshold that minimizes the difference between sensitivity and specificity (electronic supplementary material, figure S5); forecasted probabilities of disease greater than 88.6% for Lyme disease can be interpreted as disease forecasts. The true positive rate or sensitivity (the probability the model correctly forecasts disease) is 86.1% (1596 out of 1854); the true negative rate or specificity (the probability the model correctly forecasts absence of disease) 86.0% (289 out of 336; electronic supplementary material, figure S6). The model performs well (mean AUC of 0.94) ahead of the ‘invasion front’, i.e. in towns with no infected adjacent neighbours in the previous year. Forecasts ahead of the invasion front have a sensitivity of 52.9% (9 out of 17) and specificity of 100% (60 out of 60).

Modelling the reporting (observation) process improved Lyme disease model fit as measured by ΔDIC compared with the model without the observation process (electronic supplementary material, table S4). However, Lyme disease forecasting AUC, sensitivity and specificity did not differ between the parsimonious models with and without the reporting process.

Out-of-fit babesiosis model forecasts have a mean AUC of 0.85. Probabilities of disease greater than 14.0% for babesiosis (electronic supplementary material, figure S5) can be interpreted as disease forecasts. The true positive rate or sensitivity (the probability the model correctly forecasts disease) is 74.7% (510 out of 682); the true negative rate or specificity (the probability the model correctly forecasts absence of disease) 74.7% (1126 out of 1508; electronic supplementary material, figure S6). The model performs well (mean AUC of 0.87) ahead of the ‘invasion front’. Forecasts ahead of the invasion front have a sensitivity of 75.6% (65 out of 86) and specificity of 84.4% (696 out of 824).

Modelling the reporting (observation) process similarly improved babesiosis model fit as measured by ΔDIC compared with the model without the observation process (electronic supplementary material, table S4). Similar to Lyme disease forecasting, overall babesiosis forecasting performance as measured by AUC, sensitivity and specificity did not differ between the parsimonious models with or without the reporting process. Forecasting sensitivity ahead of the invasion front is sacrificed if the reporting process is not modelled: sensitivity drops from 75.6% to 46.5% when reporting is not considered. Forecasting specificity changes from 84.4% to 92.1% when reporting is not modelled.

## 4. Discussion

Modelling the spatial spread of vector-borne zoonotic pathogens maintained in enzootic transmission cycles remains a major challenge [5]. Human surveillance data often constitute the best available spatio-temporal data on pathogen spread. Over the past 40 years, TBD have rapidly spread across the northeast and midwest United States and pose a significant and growing public health burden [2,3,20,21,32,54,55]. Although the increase in human cases of TBD is striking, the crawl of ticks and tick-borne pathogens across the landscape is largely unobserved. By applying classic ecological approaches of occupancy and metapopulation modelling to an epidemiological question of disease spread, we harnessed human surveillance data to model the invasion of tick-borne pathogens. We tested hypotheses about *how* pathogens move (i.e. the spatial dependence of invasion) and *why* pathogens are spreading (i.e. environmental and ecological drivers of spread).

### (a) Processes underlying spread

As hypothesized, the processes underlying TBD invasion—colonization, persistence and re-colonization—each exhibit strong spatial dependence. Defining neighbourhood disease intensity as the inverse-distance weighted sum of the disease status of all towns in the study area rather than by adjacency provides the best fit to the data, indicating that spatial spread of both diseases is a combination of both local spread from adjacent neighbours as well as spread at longer distances. In this definition of spatial neighbourhood, closer towns are given greater weight, indicating that local spread contributes greatly to the spread of TBD.

Local spread is probably attributable to mammalian hosts that often disperse less than 5 km in addition to avian hosts moving short distances [42–44,56,57]. Long-distance colonization plays a smaller, but nonetheless significant role in colonization of TBD. Long-distance introductions of disease may be attributable to dispersal on migratory passerines [56,58]. Avian dispersal of *I. scapularis* populations has been described in geographically isolated sites in Canada where mammalian introductions were unlikely [43,59]. Alternatively, observed long-distance introductions ahead of the ‘invasion front’ may reflect differences between enzootic infection prevalence, human exposures, human disease incidence or reporting. For babesiosis, there are several apparent long-distance ‘colonization’ events in which towns with no infected neighbours report disease and then disease appears to go extinct. This reflects the low prevalence of babesiosis: in many towns in which cases are reported, there is only a single case. The apparent colonization and extinction probably reflect the epidemiological process of human infection and disease reporting rather than reflect the ecological process of *B. microti* spread across the landscape (i.e. in these colonization/extinction events, *B. microti* does not go extinct locally, but rather no cases are reported owing to stochasticity in human infection per surveillance).

Field studies have demonstrated that tick populations need to be established prior to successful pathogen colonization and that density of infected nymphal ticks is associated with Lyme disease prevalence in humans [29,37,60–62]. In contrast to our hypotheses, we found that tick density was positively associated with Lyme disease persistence and re-colonization, but not initial colonization. This may be owing to the lack of data on spread of infected *I. scapularis* over the reporting period. Increased tick density is associated with persistence of babesiosis, but not initial or re-colonization, probably owing to collinearity with Lyme disease endemicity.

As predicted, initial colonization and persistence of babesiosis is associated with proximity to a major water body. River corridors and the Atlantic Coast provide suitable, humid microhabitats for ticks and are known corridors of ticks and reservoir hosts [30,58]. The lack of an association between Lyme disease spread and proximity to water may be an artefact of the reporting time series available. The Lyme disease bacteria, *B. burgdorferi*, had already spread up the Hudson River and much of the Atlantic Coast prior to the start of the reporting time series, making it difficult to capture any association between water corridors of dispersal and disease. By contrast, the reporting time series captures the early invasion of the babesiosis parasite, *B. microti*, along the Atlantic coastline and up the Hudson River Valley (figure 1).

### (b) Contrasting spread trajectories

Our model captures differences in invasion trajectories for *B. burgdorferi* and *B. microti*. While *B. burgdorferi* appears to have spread rapidly and became endemic across most of New England by the late 2010s, *B. microti* appears to have spread slowly and became endemic only in southern coastal New England (figure 1). Babesiosis spread is characterized by lower rates of initial colonization and re-colonization compared with Lyme disease and, importantly, strong dependence on Lyme disease endemicity (figure 2). However, the mean model-estimated rate of spread of Lyme disease and babesiosis did not differ substantially over the reporting period (11.36 and 10.10 km year^{−1}, respectively). This reveals that differences in current distributions of the two diseases may reflect a temporal lag in spread of babesiosis with respect to Lyme disease owing to differences in the ecological invasion processes, pathogenesis in humans and/or owing to differential surveillance measures for the two diseases, rather than a difference in the speed of invasion.

The association of babesiosis invasion processes with Lyme disease endemicity is consistent with laboratory studies which demonstrate that co-infection with *B. burgdorferi* enhances transmission of *B. microti* in reservoir hosts and thus lowers the ecological threshold for *B. microti* establishment in proportion to *B. burgdorferi* infection in the mouse population [31]. Lyme disease endemicity also probably serves as a proxy for a suite of other unmeasured environmental, ecological or health system variables associated with high incidence of TBD, including spatio-temporal variation in *I. scapularis* density. The strong dependence on Lyme disease suggests that spread of babesiosis is likely to continue until it approximates the range of Lyme disease. However, our results suggest that Lyme disease, once reported, is more likely to persist than babesiosis, consistent with previous studies that have identified a higher basic reproductive number, *R*_{0}, for *B. burgdorferi* compared with *B. microti* [31].

Consistent with hypotheses, the probability of long-distance introduction of Lyme disease to towns independent of spatial context is 10 times more likely than long-distance movement of babesiosis. This may reflect a smaller propagule pressure of *B. microti-*infected ticks owing to the lower prevalence of *B. microti* across the study area as well as the fact that birds are more competent hosts for *B. burgdorferi* than for *B. microti* [19,58].

Mean state reporting effects are higher for Lyme disease compared with babesiosis, probably reflecting the higher prevalence, greater population and physician awareness, and more distinct symptomatology of Lyme disease compared with babesiosis [32,34]. Variation in state reporting effects is higher for babesiosis than for Lyme disease, probably reflecting heterogeneities in awareness and surveillance efforts across states [29].

Modelling variation in imperfect reporting is critical for babesiosis, because disease reports ahead of the apparent invasion front probably reflect reporting variation rather than reflect parasite invasion (e.g. the parasite is already established in the enzootic cycle ahead of the invasion front observed by human cases). Our results suggest that an occupancy modelling framework may contribute most to epidemiological models when there is substantial spatial variation in the reporting process.

### (c) Study limitations

Here, we modelled the spatial spread of *reported* cases of TBD. Owing to widespread and differential [33] under- and over-reporting of Lyme disease and babesiosis, evolving case definitions [35], and changing population awareness, raw epidemiological data may not represent the true distribution of cases and subclinical infections. However, modelling variation in reporting across states allows us to address the issue of imperfect detection of human cases.

## 5. Conclusion

We harness epidemiological surveillance data to examine an unobserved stochastic process of pathogen spread in an enzootic cycle. The spatio-temporal framework can be readily modified and adapted for study of other emerging zoonotic pathogens [34].

## Data accessibility

Town-level Lyme disease and babesiosis surveillance data for Connecticut, Rhode Island and Massachusetts are available at Dryad: http://dx.doi.org/10.5061/dryad.fs348.

## Authors' contributions

M.D.W., P.J.K., V.E.P., K.M.P., C.T.W., H.D.G. and K.S.W. conceived the eco-epidemiological question, K.M.P. formalized the modelling approach. K.S.W. collected data, implemented model, and analysed output data. K.M.P. and V.E.P. contributed to model construction and fitting. K.S.W. wrote the first draft of the manuscript, and all authors contributed substantially to revisions.

## Competing interests

We have no competing interests.

## Funding

K.S.W. was supported by the NIH pre-doctoral training grant 5T32AI007404-23 and the NIH Ruth L. Kirschstein National Research Service Award F31 AI118233-01A1. K.M.P. was partly supported by the Research and Policy in Disease Dynamics (RAPIDD) programme, Fogarty International Center, NIH and Department of Homeland Security and partly by USDA-APHIS-WS. C.T.W. and V.E.P. were funded by RAPIDD. P.J.K. was supported by the Gordon and Laura Gund Foundation. M.D.W. was supported by the National Institutes of Health, Ecology and Evolution of Infectious Disease Programme (R01 GM105246).

## Acknowledgements

The authors thank Marina Antillon, Linda Capewell, Molly Rosenberg, Fu-Chi Hsieh and Kimberly Tsao for their contributions to data collection. Models were run on Yale University's High Performance Computing cluster, Louise (supported by NIH grant nos. RR19895 and RR029676-01). The authors thank Dr Robert Bjornson for bioinformatics assistance.

- Received April 13, 2016.
- Accepted May 9, 2016.

- © 2016 The Author(s)

Published by the Royal Society. All rights reserved.