Abstract
Recurrent episodes of tuberculosis (TB) can be due to relapse of latent infection or exogenous reinfection, and discrimination is crucial for control planning. Molecular genotyping of Mycobacterium tuberculosis isolates offers concrete opportunities to measure the relative contribution of reinfection in recurrent disease. Here, a mathematical model of TB transmission is fitted to data from 14 molecular epidemiology studies, enabling the estimation of relevant epidemiological parameters. Metaanalysis reveals that rates of reinfection after successful treatment are higher than rates of new TB, raising an important question about the underlying mechanism. We formulate two alternative mechanisms within our model framework: (i) infection increases susceptibility to reinfection or (ii) infection affects individuals differentially, thereby recruiting highrisk individuals to the group at risk for reinfection. The second mechanism is better supported by the fittings to the data, suggesting that reinfection rates are inflated through a population phenomenon that occurs in the presence of heterogeneity in individual risk of infection. As a result, rates of reinfection are higher when measured at the population level even though they might be lower at the individual level. Finally, differential host recruitment is modulated by transmission intensity, being less pronounced when incidence is high.
1. Introduction
Despite significant improvements in tuberculosis (TB) treatment over recent years, adequately treated patients are still at high risk of developing recurrent pulmonary disease (defined as an episode of TB following the cure of a previous episode). Recent estimates for the recurrence rate of TB across different regions point to an average of 2290 cases per 100 000 personyears at 12 months after treatment completion. In highincidence regions, the average TB recurrence rate can reach 7850 per 100 000 personyears [1].
The contribution of exogenous reinfection with Mycobacterium tuberculosis (Mtb) versus that of endogenous reactivation (relapse) of latent Mtb to the overall rate of recurrence of pulmonary disease is subject to controversies because these two mechanisms cannot be easily disentangled. Deciphering the weight of each of these mechanisms is of great importance for policymaking. Advances in DNA fingerprinting techniques allowed the genotyping of the Mtb, causing different disease episodes [2]. These methods can reveal whether a new episode of disease is caused by infection with the same strain that caused a previous episode or a different one, enabling a classification into relapse or reinfection, respectively.
Longitudinal data from molecular epidemiological studies on TB reinfection have shown a positive correlation between the proportion of reinfection in recurrent cases and local incidence [3]. A longterm study, in an area of South Africa, with particularly high incidence, attributed the majority (77%) of recurrent TB cases to reinfection [4]. Moreover, the rate of TB reinfection was found to be four times higher than that of new TB, raising an important question about the underlying mechanism. Two possibilities have been proposed to explain these results: (i) infection increases susceptibility to reinfection or (ii) infection occurs at a higher rate in a highrisk subpopulation [5].
We address this question by constructing a general framework for TB transmission that encapsulates both mechanisms, and assess its capability to fit available data relating TB incidence and reinfection proportion among recurrent cases, in 14 regions throughout the world. The dataset was gathered by systematic literature review. The model postulates that some individuals are a priori more likely to be infected owing to enhanced susceptibility or exposure [6–8]. Infection is more likely to affect individuals at higher risk, who will naturally be overrepresented in the treated subpopulation to a degree that is modulated by transmission intensity (illustrated in figure 1). This process alone acts to inflate the rate of reinfection at the population level, even if infection confers partial protection at the individual level. The relative susceptibility of individuals who have been previously infected over those who are naive is represented by a parameter σ, the value of which will be estimated by the fitting procedure.
2. Methods
(a) Literature review
Through a systematic literature review, we aggregated data on recurrent TB and its relationship with TB incidence. Published epidemiological studies were located via PubMed through searches on the following terms: tuberculosis, recurrent, relapse, reinfection or reinfection. Studies were included in the analysis if fulfilling the following criteria:

— study reports the number of recurrent TB cases, defined positive culture after bacteriologically confirmed cure or complete treatment following a first episode;

— study reports more than 10 recurrent TB cases;

— study discriminates between reinfection and relapse by comparing Mtb DNA fingerprinting profiles of the initial and recurrent episodes; and

— populationbased study published up to March 2011.
We extracted the data for the local incidence from the study papers whenever possible. When no incidence was reported in the study itself, we used the estimates for the year 2000 in the respective country, provided by the World Health Organization [9]. The proportion of reinfection was defined as the ratio between the number of patients with reinfection and all recurrent TB cases. One study performed in Cape Town also provided the ratio between the rate of reinfection TB in successfully treated patients and the rate of new cases of TB [4].
(b) Mathematical formulation
A mathematical model [10,11] is extended by enabling the risk of infection to be heterogeneously distributed among the population. The transmission model is replicated in two subpopulations, indexed by i = 1,2, such that immunologically naive individuals in subpopulation 1 (low risk) are subject to a per capita rate of infection (λ_{1} = α_{1}λ), whereas in subpopulation 2 (high risk) infection occurs at a higher rate (λ_{2} = α_{2}λ), where α_{1} < α_{2}, and γ_{1}, γ_{2} are the proportions of the population in each risk group. Within each group, individuals are classified—according to their infection history—into susceptible (S_{i}), primary infection (P_{i}), latent (L_{i}), active pulmonary tuberculosis (I_{i}) and recovered (R_{i}). Latent and recovered individuals can be reinfected at a rate that is proportional to the rate of first infection, with multiplicative factor σ. The model is written as a system of differential equations 2.1where μ (=1/70 yr^{−1}) is the birth and death rate, ϕ (=0.05) is the proportion of primary infections progressing to active pulmonary disease, and τ (=2 yr^{−1}) is the rate at which infectious individuals are detected and treated. An auxiliary parameter δ (=12 yr^{−1}) is included to represent the rate of progression from primary infection, although this can be interpreted only in conjugation with other parameters. For example, the rate of progression from primary infection to disease is ϕδ (=0.6 yr^{−1}). The value for μ is consistent with human populations with life expectancy of 70 years, whereas ϕ and τ are consistent with the medical literature [12]. The rate of relapse (ω) is fixed at the higher end of published estimates (ω = 0.01 yr^{−1}) [13–17], with justification and sensitivity analysis provided in the electronic supplementary material. Parameters referring to the reinfection factor (σ) and risk heterogeneity are estimated in order to adjust the equilibrium model solutions to the available data. By noting that γ_{1} and γ_{2} are proportions, and thus γ_{1} + γ_{2} = 1, and by normalizing the average risk factor such that γ_{1}α_{1} + γ_{2}α_{2} = 1, heterogeneity is fully parametrized by the lowrisk parameters, γ_{1} (≡γ) and α_{1} (≡α). The parameters are listed in table 1.
(c) Measures of incidence
TB incidence (cases per 100 000 personyears) is calculated from the equilibrium proportion of infectious individuals as 2.2
Following the criteria used in the data collection, we classify a recurrent TB case as an individual who enters the infectious class after having gone through the recovered class. This combines two pathways: relapse while in the recovered class, Y_{relapse}; and exogenous reinfection with progression to active disease, Y_{reinfection} (direct or following a latent period). We derive these quantities formally from equation (2.1). The instantaneous rate of relapse after successful treatment is the sum of R_{i} → I_{i} transitions 2.3while the instantaneous rate of reinfection after successful treatment is the sum of and transitions for any number (n) of iterations of the cycle , derived as 2.4Hence, the proportion of reinfection over all recurrent TB after successful treatment is 2.5
The rate of new TB is given by 2.6
Finally, for comparison with the Cape Town study [4], we define κ as the ratio of the rate of reinfection TB among successfully treated patients over the rate of new TB cases among those who have never had a TB episode, formally calculated as 2.7
(d) Metaanalysis
We perform a metaanalysis of the relationship between the proportion of reinfection in recurrent TB (p) and the incidence of TB (Y) by fitting the model described in equation (2.1) to the dataset collected by the systematic literature review. By taking the incidence rate as an independent variable of a nonlinear regression, we estimate the set of model parameters that best describe the observed trends in the proportion of TB reinfection. First, we have considered a model where the host population is homogeneous with respect to risk of infection, and proceeded to assess whether heterogeneity would significantly improve the ability of the model to fit to the data. A Gaussian–Newton algorithm was implemented to fit the model output to the data according to the leastsquares criterion. We used an Ftest and a loglikelihood test to assess whether the heterogeneous model provides a significantly better fit to the data.
3. Results
We conducted a systematic literature review and found 14 molecular epidemiological studies reporting the proportion of reinfection in recurrent TB across communities presenting a wide range of endemic levels [3,4,18–29]. Table 2 summarizes the data collected.
Figure 2 shows the output of the homogeneous and heterogeneous versions of equation (2.1) that best fit the dataset, whereas the estimated parameters are listed in table 3. The figure shows the proportion of reinfection TB in recurrent cases (p) versus TB incidence (Y) at equilibrium. The model shows a markedly nonlinear relationship between the proportion of reinfection and local incidence, which was not captured by previous studies [3]. The ratio (κ) of the rate of reinfection TB over the rate of new TB predicted by the model is plotted in figure 3, where it can be confronted with the measure obtained for Cape Town [4]. Although the data published by the other 13 studies do not enable the calculation of this quantity, the homogeneous model predicts equally high values, whereas under the heterogeneous model, we expect higher ratios in all regions that report lower incidence than Cape Town.
Noting that the homogeneous model is nested within the heterogeneous, we have calculated the Ftest statistic to show that heterogeneity enables a significantly better fit to the data (table 3). The best scenario provided by this analysis estimates the heterogeneity parameters as γ = 0.98 and α = 0.15, suggesting that the risk of infection is about 40 times higher than average in the 2 per cent subpopulation at highest risk. Although these are necessarily crude approximations that enabled the model to simultaneously fit a vast spectrum of epidemiological scenarios, they can serve as a basis for further resolution in a regionspecific manner. The estimated reinfection factor (σ = 0.51) indicates that previous infection has a protective effect, contrary to a model proposed previously [30].
4. Discussion
We propose a minimal model for TB transmission to describe the relative contributions of reinfection and relapse to recurrent TB across a range of transmission intensities. A nonlinear relation between the proportion of reinfection and the local incidence is derived by fitting this mechanistic model to the dataset resulting from a systematic literature review. By accounting for heterogeneity in the risk of infection, we obtain significantly better model fittings to epidemiological data. This trend is in agreement with current understanding of TB transmission, especially in regions of low to moderate transmission, where TB is confined to particular risk groups with sporadic small outbreaks in the general population [31,32], and has been previously noted in theoretical studies [33]. Infection acts upon this variation and predominantly recruits those individuals at higher risk to the recovered category, thus inflating the rate of reinfection (as illustrated in figure 1). As a result, population measures of reinfection rates in relation to first infection (κ) are higher than the reinfection factor at the individual level, σ.
The model predicts that, under heterogeneity, regions of low to moderate transmission support relatively higher κ than regions of high transmission. This is again owing to the way differential recruitment acts upon individuals at higher risk. For highly endemic regions, transmission intensity tends to homogenize the distributions of both susceptible and recovered individuals, making differential recruitment less pronounced [34]. Cape Town is the only study reporting the information required to estimate this ratio. Similar studies providing this data for other regions would be very valuable to confirm the validity and increase the accuracy of the results reported here.
The parameter estimation procedure (provided in the electronic supplementary material) supports higher relapse rates than those previously stated for European countries [14,15], whereas recent studies in African [13] and Asian [16,17] regions suggest values that are compatible with those considered here. This may be due to higher prevalence of coinfection with HIV in those settings or simply reflect regional differences in nutrition, smoking patterns, environmental conditions, population structure or the natural history of TB [16,17,35,36]. Despite these acknowledged differences, we have opted for constancy across regions in model parameters, with the exception of the force of infection, because our objective is to make inferences about the global epidemiology of TB. These inferences were enabled by further assuming regional equilibrium conditions and selecting the set of model parameter values that best reproduce the trends of reinfection rates in relation to TB incidence.
Acknowledgements
We thank Antonio Coutinho, Jim Koopman and Graham Medley for valuable discussions. This research was supported by Fundação para a Ciência e a Tecnologia (FCT) and European Commission (grant nos. MEXTCT200414338 and ECICT231807). C.R. was supported by FCT, PEst OE/MAT/UI0209/2011. C.J.S. was partially funded by CNPq and FAPERJ.
 Received December 29, 2011.
 Accepted January 30, 2012.
 This journal is © 2012 The Royal Society