Estimating the basic reproduction number of a pathogen in a single host when only a single founder successfully infects

Authors: Vruj Patel ^aff001; John L. Spouge ^aff001
Authors place of work: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America ^aff001
Published in the journal: PLoS ONE 15(1)
Category: Research Article
doi: https://doi.org/10.1371/journal.pone.0227127

Summary

If viruses or other pathogens infect a single host, the outcome of infection may depend on the initial basic reproduction number R₀, the expected number of host cells infected by a single infected cell. This article shows that sometimes, phylogenetic models can estimate the initial R₀, using only sequences sampled from the pathogenic population during its exponential growth or shortly thereafter. When evaluated by simulations mimicking the bursting viral reproduction of HIV and simultaneous sampling of HIV gp120 sequences during early viremia, the estimated R₀ displayed useful accuracies in achievable experimental designs. Estimates of R₀ have several potential applications to investigators interested in the progress of infection in single hosts, including: (1) timing a pathogen’s movement through different microenvironments; (2) timing the change points in a pathogen’s mode of spread (e.g., timing the change from cell-free spread to cell-to-cell spread, or vice versa, in an HIV infection); (3) quantifying the impact different initial microenvironments have on pathogens (e.g., in mucosal challenge with HIV, quantifying the impact that the presence or absence of mucosal infection has on R₀); (4) quantifying subtle changes in infectability in therapeutic trials (either human or animal), even when therapies do not produce total sterilizing immunity; and (5) providing a variable predictive of the clinical efficacy of prophylactic therapies.

Keywords:

Sequence alignment – Viral pathogens – HIV – HIV infections – Mutation detection – Viremia – Approximation methods – Infectious disease epidemiology

Introduction

When viruses or other pathogens infect a single host, the basic reproduction number R₀ is the expected number of cells infected by a single infected cell [1]. The initial R₀ is a fundamental determinant of whether an infecting viral population will establish itself in the host. On one hand, if R₀ < 1, the viral invaders reproduce below replacement and will go extinct. On the other hand, if R₀ is slightly greater than 1, an initial virus has a small positive probability of amplifying into a systemic infection, and if R₀ is large, infection is all but inevitable.

The initial basic reproduction number R₀ is therefore a continuous variable with direct biological pertinence to infection. As such, it may have many underappreciated applications. As a general example, consider therapeutic trials. More specifically, consider as a motivating application pre-clinical tests of HIV therapies like vaccines in animal models. In this context, the macaque model is popular, because simian immunodeficiency virus can cause a disease progression resembling AIDS in humans [2]. Historically, macaque trials often used a single high-dose intravenous or mucosal inoculation to ensure almost certain infection of unprotected animals [3]. The consequent assessment of therapeutic efficacy depends primarily on binary categorical data, i.e., whether or not infection occurred in treated animals [4,5]. New experimental designs, notably repeated low-dose challenges, have improved statistical power in animal trials [3,6], but do not remove the intrinsic statistical limitations of binary data. The ability to estimate a continuous variable like the initial R₀ in HIV infection would in principle permit a statistically more powerful analysis of therapeutic efficacies.

The initial R₀ is likely a primary determinant of whether systemic infection occurs, and initial microenvironments with relatively few target cells probably impede systemic infection by HIV. Consider, e.g., that the number of per-act transmissions per 10,000 exposures varies considerably by route of infection [7]. For sexual exposure, the number ranges from less than 4 to 138; for needle-sharing, it is about 63. In contrast, for vertical transmission between mother and child, the number is 2260; for blood transfusion, it is 9250. Thus, the initial microenvironment may starkly limit the reproduction of HIV until the virus escapes into systemic circulation, where target cells are plentiful.

Unfortunately, practical thresholds for HIV detection make the initial R₀ inaccessible to direct measurement, because on average, viremia is delayed until about 10 days after exposure to HIV [8,9]. Modern techniques for measuring the abundance of HIV [10] have yielded estimates, e.g., R₀ ≈ 6 [11] or R₀ ≈ 8 [12]. These estimates of R₀ pertain to viremia, however, when there are at least 20 viruses/ml (the current lower limit of detectability) and thus about 10⁵ viruses in the total blood volume of 5L [13]. By the time HIV is detectable in blood samples and R₀ can be measured directly, infection has long since established itself.

The viral dosage may vary considerably between the modes of transmission above, and in sexual or mother-to-child transmission of HIV, e.g., the genetic diversity at early stages of infection usually reflects the number of viruses founding the infection [14–16]. In fact, however, about 80% of all HIV infections arise from a single founding viral sequence [17–19]. Moreover, the design of repeated low-dose challenge animal trials is likely to cause infections with a single founding virus. Thus, because of its practical importance, the case of a single founder virus is a convenient starting point for mathematical analysis, and the rest of this article assumes a single founder.

Although most of this article is self-contained, it continues a scientific program started in [20]. Direct measurement of R₀ early in infection may be impractical, but Fig 1 in [20] showed that the initial R₀ displays its footprint in HIV sequences sampled in early viremia, during exponential expansion of the viral population. To describe the essence of Fig 1 in [20], if only two daughters of the founder successfully contribute descendants to the viremia, and one daughter has a novel mutation away from the founder, about half the sequences sampled in viremia have the mutation. In contrast, if the founder has many daughters successfully contributing descendants to the viremia, far fewer than half the sequences sampled in viremia are likely to have any given mutation.

The structure of this article is as follows: the theory section applies standard statistical procedures to yield a robust method for estimating R₀ from sequence data. The Methods section then describes the simulation of a continuous-time branching process whose parameters are pertinent to sampling sequences of the HIV gp120 gene. The process is the “Gamma model” of [20], a special Bellman-Harris process [21] that idealizes HIV reproduction. The Results section displays the accuracy of our estimator in recovering R₀ in the idealized simulation and the Discussion section examines some consequences of the theory. Finally, the Supporting Information compares our estimator to other (inferior) statistical techniques that we applied to recover R₀ from the same idealized simulation.

All approximations in this article are uncontrolled, i.e., we cannot provide bounds on the error that the approximations cause. Usually without comment, therefore, we rely on simulations of the Gamma model to assess their accuracy. Parameter regimes not pertinent to HIV reproduction and the sampling of gp120 sequences require separate assessment and are beyond the immediate practical purview of this article.

Finally, as motivation, in the context of the Gamma model, our estimator compares favorably with state-of-the-art methods. It has an exceptionally simple analytic form, e.g., yielding a negligible computation time in comparison to exact Bayesian calculations. Moreover, under methods analogous to ours, the coalescent process yields the same estimator as continuous-time branching process models of population growth. Under the Gamma model for HIV reproduction, however, the estimator has a singularity, making it useless for quantitation if R₀ ≥ e ≈ 2.7.

Theory

The following set-up assumes that a single founder virus has infected a single host (e.g., a single HIV infects a human). The set-up is mostly self-contained, drawing only on a few critical approximations presented elsewhere with some mathematical foundations [20]. First, before modeling viral ancestry, we examine the sampling of the viral sequences.

With “≔” denoting a definition, define (m) ≔ {1, 2, …, m}. Fix an alphabet Λ, e.g., the unambiguous nucleotide alphabet Λ = {a, c, g, t}. In practice, sequence analysis must invoke a strategy for handling anomalous characters in an alignment (e.g., ambiguous nucleotides or gap characters). Sometimes, anomalous characters are infrequent, so that as an acceptable approximation, the analysis can treat them as ordinary characters by enlarging its alphabet. Sometimes, the analysis simply omits columns containing them. Without further comment, the following assumes that the practical analysis has adopted an unspecified strategy for handling anomalous characters.

Consider a set S_• ≔ (S_m: m ∈ (M)) consisting of M sequences. The sequences are sampled simultaneously from the descendants of a single founder sequence Φ. Align the sequences S_•, so that the sequences S_• form an alignment matrix S_•,• ≔ {S_m,n: m ∈ (M), n ∈ (N)} of N columns. Given an unspecified strategy for processing sequences S_• into S_•,•, the analysis here simply starts with the alignment matrix S_•,•. Implicitly, Φ aligns with S_•,•, so the letter Φ_n in Φ is ancestral to each letter S_m,n in the matrix column n ∈ (N) (where S_m,n is from the sequence S_m). In practice, Φ is often unknown, a complication we handle shortly.

The Iverson bracket for indicator random variates is a standard notation [22]: let [A] = 1 if the statement A is true, and [A] = 0 otherwise. Let Mn(L)≔∑m=1M[Sm,n=L] (n ∈ (N)) count the instances of letter L ∈ Λ in column n of S_•,•. Given the strategy for handling anomalous characters, M = Σ_(L∈Λ)M_n(L).

The difference D_n ≔ D_n(Φ) ≔ M −⁠ M_n(Φ_n) counts letters in column n that have mutated away from the founder Φ; let D ≔ D(Φ) ≔ (D_n(Φ): n ∈ (N)). Given D, let ηm≔ηm(Φ)≔∑n=1N[Dn(Φ)=m] count the alignment columns n ∈ (N) where m letters differ from the founder letter Φ_n, and define the site frequency spectrum (SFS) as η ≔ η(Φ) ≔(η_m(Φ): m ∈(M)). Typically, Φ is unknown, so η is not observable.

To develop a statistical model for η, let ε_n be the probability of a mutation per base per generation in column n of the alignment. As in the infinite-sites model [23], we neglect the extremely rare possibility that two or more mutations occur in the ancestry of a single letter S_m,n. Let μ=∑n=1Nεn be the expected number of novel mutations per generation in the sequences. Despite its biological importance, the effects of preferential selection on sequence data are practically imperceptible during the first six months of HIV infection (see the first paragraph in the Materials and Methods of [19] and Fig 1 in [24]). Assume therefore that (ε_n: n ∈ (N)) are all small, and that novel mutations are all independent. To a good approximation, in every daughter the counts of novel mutations are independent Poisson variates with fixed mean μ.

For linguistic convenience, let every virus be both her own ancestor and her own descendant. Let A_m count the non-founder viral ancestors with m descendants in the sample, and define as in [20] the ancestral sample frequency spectrum (AFS), A ≔ (A_m: m ∈(M)). (Each sampled sequence contributes to A₁, e.g., because by convention, each sampled sequence is its only descendant in the sample). Under an infinite-sites model, every novel mutation occurs in a different column of the alignment. Every alignment column with m mutations therefore corresponds to a novel mutation in an ancestor with m descendants in the sample [25]. Given A, the coordinates of η are independent Poisson variates, with η_m having mean μA_m (see, e.g., Theorem 1 in [20]). Accordingly, the relationship is written as η=_d Poission(μA), where “=_d” indicates equality of distributions.

Eq (17) in [20] used the law of total variance to write

Now, we restrict the discourse to the Gamma model for HIV gp120 (as detailed in the Methods section, which need not be read yet). Simulations of the Gamma model showed [20] that for μ = 0.0551 (the value for HIV gp120), the typical magnitude of the ratio μ2σ2(Am)/(μEAm) from Eq (1) was at most about 18%. For μ = 0.0551 in the Gamma model, therefore, the mutational variance μEAm makes the dominant contribution to σ²(η_m). The form of Eq (1) shows that the dominance remains robust to varying μ (particularly decreasing it), as long as the ratio μ2σ2(Am)/(μEAm) remains small (say, less than 50%, occurring about μ ≈ 0.0551 × (0.50/0.18) ≈ 0.153).

Let am≔EAm and a ≔ (a_m: m ∈ (M)). Given the distributional equality η=_d Poission(μA), our observations on Eq (1) therefore suggest that the distributional approximation η≈_d Poission(μa) pertains, as follows. In the present context (the distribution of η under μ = 0.0551 in the Gamma model), the variation of A contributes little to the variance of η_m: effectively, as noted by other authors [1,19,24], A behaves as though it were the constant am=EAm when contributing to random fluctuations in η_m. In effect, the approximation A_m ≈ a_m treats Eq (1) as an expansion in μ around μ = 0 and drops terms quadratic in μ to retain the approximation σ2(ηm)≈μEAm. As a linear approximation, it should improve as μ decreases and worsen as μ increases.

To avoid distracting subscripts in the following equations, let r ≔ R₀ denote the basic reproduction number R₀ from the Introduction. For some practical purposes, HIV reproduces almost in lockstep, with synchronous generations (see the Delta model of [20], a Galton-Watson branching process [26]). Let G count the generations of HIV after host infection. To summarize the previous paragraph,

In any ancestry with G synchronous generations, MG=∑m=1MmAm. (Proof: count the ancestors in each of the generations g = 1, 2, …, G, accounting for the multiplicity m of their sampled descendants. The total count is equivalent to counting each of the M samples G times). Take expectations to derive

Reference [20] showed that for the Gamma model,

accurately approximated am=EAm (m = 2, 3 …, M). The heuristic behind the approximation follows. To a good approximation, HIV has synchronous generations g = 1, 2, …, G. On average, generation g contains r^g individuals. Each viral sequence in the sample therefore has an approximate probability r^−g of descending from any particular individual g in generation g. Thus, the probability that g has m descendants in a sample of size M is approximately the binomial probability on the right of Eq (4). Sum the binomial probability over the individuals g in generation g (on average, r^g in number) and then over all generations g = 1, 2, …, G. Let G tend to infinity to derive Eq (4).

Eqs (3) and (4) therefore show that if in the Gamma model a1≈a1(Δ), then lim_G→∞(MG −⁠ a₁) is approximately the (finite) quantity

In statistical notations, “•” often suggests a sum, as in a•(Δ).

The variable η in Eq (2) depends on the unknown founder sequence Φ. To relate η to an observable, define M˜n≔maxL∈ΛMn(L), the maximum count of any single letter in column n. Loosely, D˜n≔M−M˜n then counts minority letters in column n. Unlike D_n, the observable D˜n has no dependency on the founder sequence Φ. Let ⌊x⌋≔max{i∈ℤ:i≤x} denote the floor function, with M˜≔⌊M/2⌋. Define the folded SFS η˜≔(η˜m:m∈(M˜)) [27], with

for m = 1, 2, …, ⌊(M −⁠ 1)/2⌋), where the second equality holds for an infinite-sites model (which we have assumed). If M is odd, ⌊(M−1)/2⌋=⌊M/2⌋=M˜, so the definition of η˜ is complete. If M is even, the pattern of pairs η_m + η_M−m displayed in Eq (6) fails for m=M˜, because ηM˜ cannot be paired with a distinct ηM−M˜. If M is even, therefore, define η˜M˜≔ηM˜ for m=M˜. Loosely, in all cases, η˜m counts columns n where the number of minority letters equals m(m∈(M˜)).

The folded AFS A˜≔(A˜m:m∈(M˜)) inherits the pattern for η˜ established in Eq (6): A˜m≔Am+AM−m for m = 1, 2, …, ⌊(M −⁠ 1)/2⌋; if M is even, A˜M˜≔AM˜. Henceforth and without comment, the same pattern generates folded quantities (denoted by over-tildes) from unfolded quantities, e.g., a˜≔EA˜. The folded SFS η˜ inherits an approximate Poisson distribution from the SFS η:

Because

the pattern suggests imposing the definition

Eq (7) applied to the observable η˜ yields a maximum likelihood estimate (MLE) (G^,r^). The asymptotic properties of an MLE make r^ a reasonable benchmark for other statistical estimates of r. Recall Eqs (8) and (9) relating a˜1 and MG, and take natural logarithms in Eq (7) to derive the (approximate) log-likelihood

where “≡” indicates an equality of functions, possibly ignoring an irrelevant additive term depending only on data (e.g., a term equaling a function of η˜).

As a useful approximation, the following treats G as a continuous variate. Now, for any value of r (and not just r=r^),

where the second equality holds if the maximum is an internal maximum, so the argument G^ is determined by setting the derivative with respect to G equal to 0 (i.e., if G^ satisfies μ(MG^−a˜•)=η˜1). An MLE r^ then maximizes the profile log-likelihood, defined as

Now, a˜m(1)=a˜m(∞)=0 for m=2,3,…,M˜. Because ln L(r) ↓ −∞ at the boundaries of the interval (1, ∞), a MLE r^>1 therefore exists, such that

has a root at r=r^. In the following, if an MLE lacked an explicit analytic expression, a golden-section search for maxima determined it numerically.

Unfortunately, the direct method of determining r^ by substituting am(r)≈am(Δ)(r) and then maximizing Eq (12) or solving Eq (13) entails many undependable numerical computations, because Eq (4) for am(Δ)(r) requires multiplying unreasonably large and small numbers, followed by adding the resulting products with great precision. For completeness, the Supporting Information develops an approximate MLE r^(Δ) by approximating a_m(r) with am(Δ)(r) and compares it with our best estimator, derived as follows. Approximate the sum in am(Δ) by an integral (the first term of an Euler-Maclaurin series [28]): because (1 −⁠ r⁰)^M−m = 0 for m = 2, 3, …, M −⁠ 2,

where the final equality derives from the evaluation of a beta integral. Comparison of Eq (4) and the two sides of Eq (14) shows that

approximates a_m for m = 2, 3, …, M −⁠ 2.

After substituting am(I)(r) for a_m(r) in Eq (13) and unfolding all folded quantities,

Telescoping cancellation yields

To estimate r, define η˜−1≔∑m=2M−2ηm=∑m=2M˜η˜m (an observable, the sum of all η˜m except η˜1), so the root r^(I) of Eq (16) satisfies

The sum of independent Poisson variates is Poisson distributed, so the variate η˜−1 is approximately Poisson distributed, with σ2(η˜−1)≈Eη˜−1. Define ln² x ≔ (ln x)². From a linear Taylor series approximation, the approximation varf(η˜−1)≈[f′(Eη˜−1)]2varη˜−1 with f(η) = η⁻¹ yields

where the final approximation probably has a small relative error if Eη˜−1 is large (i.e., η˜−1≈Eη˜−1).

For comparison of our results with state-of-the-art methods, both the coalescent process and continuous-time branching process models of population growth produce the same approximation, that

approximates a_m for m = 2, 3, …, M −⁠ 2. To relate Eqs (15) and (20), recall the Taylor expansion ln r = −ln[1−(1 −⁠ r⁻¹)] ≈ 1 −⁠ r⁻¹ near 1 −⁠ r⁻¹ = 0, i.e., near r = 1. The text near Eq (13) of [20] elaborates on the context of Eq (20). For comparison with Eq (18), which gives the estimator r^(I) in derived from Eq (15), consider the estimator r^(C) derived analogously from Eq (20),

Routine algebra shows that the estimators are related by the equation

Methods

Although the biological rationale given below in support of the Gamma model of HIV gp120 is brief, the interested reader can find a more detailed discussion elsewhere [20]. The Gamma model was simulated for each r = R₀, as follows. Each realization started with a single successfully infecting founder virus, which (for bookkeeping purposes) died at time t = 0, giving birth to a random number Z₁ of successfully infecting daughters. Biologically, each infected cell produces thousands of daughter virions, each with a small independent probability of infecting. The simulation therefore chose the number Z₁ from a Poisson distribution with mean r = R₀. Each daughter lived an independent random time after her birth. To approximate life-cycle times relevant to HIV [19,29], the random times had a gamma distribution with mean 2 days and standard deviation 0.24 days. The shape and rate parameters of the gamma distribution were therefore (n, λ) = (69.4, 34.7) (to make the mean nλ⁻¹ = 2 and variance nλ⁻² ≈ 0.24²). The life-cycle of all the founder’s descendants were similar, making the Gamma model a Bellman-Harris process [21] with parameters specific to HIV.

Each realization generated daughters until there were 6000 live viruses, to mimic a threshold of viral detection in blood. We also examined thresholds larger than 6000, but the exact threshold did not substantially alter scientific conclusions. If the viral population went extinct first, the realization restarted with a new founder. The 6000 live viruses were then sampled to produce six samples, of sizes M = 2^k (5 ≤ k ≤ 10). For each sample size M, tracing back the ancestry of the sample determined the ancestral sample frequency A, which yielded the folded ancestral sample frequency A˜.

In HIV, the gp120 gene is about 2550 nt long, and (with crossovers neglected) each HIV replication averages ε ≈ 2.16 × 10⁻⁵ point mutations/base/replication [1]. On average, therefore, each RNA replication entails μ ≈ 0.0551 mutations in gp120. Simulating from the Poisson distribution in Eq (7) with μ ≈ 0.0551 yields a folded site-frequency spectrum η˜ for the realization.

If η˜2=η˜3=…=η˜M˜=0, the realization fails to estimate ln r. For each r and each M, the simulation recorded the number F of failed realizations encountered before performing 1000 successful realizations.

For each of the 1000 successful realizations, η˜ yielded lnr^ for the estimate r^(I) in Eq (18). The simulation also estimated the corresponding standard deviations σ^(lnr^) given in Eqs (19). For each r and each M, the simulation calculated a sample mean E(lnr^) and sample standard deviation σ(lnr^) from the 1000 successfully sampled values of lnr^. It also calculated the sample mean of the estimated standard deviation Eσ^(lnr^) for comparison with the sample standard deviation.

We simulated ancestries as described above for a discrete grid of basic reproduction numbers r = (1 + ε)^k (k = 1, 2, …,⌊log_1+ε R⌋), so 1 < r ≤ R, with R = 10 and ε = 0.1 (as in [20]).

Results

Eq (18) suggests that ∑m=2M−2ηm tends to decrease as r increases (the Introduction refers the reader to Fig 1 in [20] for an intuitive explanation). If η−1=∑m=2M−2ηm=0 in Eq (18), r^(I)=∞ so one can only infer qualitatively that r is large. Thus, if every minority letter in an alignment column is a singleton, making η₋₁ = 0, a realization fails to estimate r quantitatively. Fig 1 displays the fraction of failed realizations (ones where η˜2=η˜3=…=η˜M˜=0).

Plot of the fraction of realizations failed against the true log<sub>10</sub> <i>r</i>. — **Fig. 1. Plot of the fraction of realizations failed against the true log₁₀ r.**

In Fig 1, the x-axis displays x = log₁₀ r (where r = R₀); the y-axis, the fraction of failed realizations. The x-axis runs from log₁₀ r = 0.0 to log₁₀ r = 1.0, i.e., r = 1 to r = 10; the probabilities on the y-axis, from 0.0 to 1.0. The solid line corresponds to M = 32; the dashed line, to M = 64; the dashed-dot line, to M = 128; the dotted line, to M = 256. The fraction of failed realizations was identically 0.0 in all simulations with M = 512 and M = 1024.

In each subfigure of Fig 2 for r^=r^(I) (and also S1 and S2 Figs in the Supporting Information), the solid curves indicate the sample mean y=Elog10r^; the two dashed curves above and below each solid curve indicate y=Elog10r^±σ(log10r^), i.e., they indicate bands above and below y=Elog10r^ of height equal to the sample standard deviation σ(log10r^); and the two dot-dashed curves above and below each solid curve indicate y=Elog10r^±Eσ^(log10r^), i.e., they indicate bands above and below y=Elog10r^ of height equal to the sample mean of the estimated standard deviation.

**Fig. 2. Plots for the maximum likelihood estimate (integral approximation r^(I)).**

In Fig 2, both x -⁠ and y-axes display log₁₀ r: the horizontal, the true value x = log₁₀ r; the vertical, the estimated value y=log10r^. The x-and y-axes run from log₁₀ r = 0.0 to log₁₀ r = 1.0, i.e., r = 1 to r = 10. The dotted diagonal line indicates perfect estimation, r^=r. In their upper left, each of the subfigures (a)-(f) indicates the corresponding sample size M = 2^k (5 ≤ k ≤ 10).

Fig 2 for r^(I) displays progressively better recovery of r as M increases. For M ≥ 128, the accompanying error estimate σ^2(lnr^(I)) also has practical accuracy. Near log₁₀ r = 0, Fig 2e and 2f show some systematic overestimation away from the perfect estimate r^=r (see also S1 and S2 Figs in the Supporting Information).

Fig 3 plots y=log10r^(C) from Eq (22) against x=log10r^(I). The dotted diagonal line indicates perfect agreement, log10r^(C)=log10r^(I). In Fig 2, y=log10r^(I) slightly overestimated the true x = log₁₀ r. In Fig 3, y=log10r^(C) is consistently larger than x=log10r^(I). The two estimators r^(C) and r^(I) agree well as r^(I) decreases to 1, in accord with the Taylor expansion near r = 1 following Eq (20). (The Appendix of [20] also shows that in the present context, the Delta model of [20], the coalescent model, and the continuous-time branching-process model produce the same limiting SFS as r decreases to 1). Fig 3 also shows, however, that for larger values of r^(I), the coalescent estimate r^(C) becomes a gross overestimate, and it even blows up to infinity at log10r^(I)=log10e≈0.434.

**Fig. 3. A plot of r^(C) against r^(I).**

Discussion

In infection of a single host, the basic reproduction number R₀ is the expected number of cells infected by a single infected cell. This article shows how mutational variations observed in a sample from a population descended from a single founder yield an estimator r^(I) of R₀. It verifies the accuracy of its uncontrolled approximations with simulations that show that r^(I) reproduces R₀ within accuracies practicable for some purposes.

Our plots and statistics took log R₀ as the natural scale for the basic reproductive number R₀. The population size at generation g is N_g = (R₀)^g, so the population effects of changes in R₀ are linear on a log scale: log N_g = g log R₀. On one hand, increasing R₀ by 1 has a greater effect on N_g when R₀ = 1 than when R₀ = 10. On the other, hand, doubling R₀ has the same additive effect on log N_g independent of the value of R₀.

Before considering the biological implications, we make a few technical statistical observations on the estimate itself. In simulations of HIV gp120 sequences, the absence of mutations common to sequence samples can cause estimation of R₀ to fail. For sample sizes M ≥ 128, the probability of failing to estimate R₀ was less than 0.051 for all 1 < R₀ < 10 (see Fig 1). To the authors’ knowledge, studies sampling HIV sequences from patients typically sample between M = 16 and M = 30 [30] sequences per patient, an insufficient depth to test the present theory.

The estimator r^(I) in by Eq (18) has a simple analytic form, a negligible computation, that should be compared to the complexity of competing Bayesian calculations. Moreover, it does not contain derivatives that may lose accuracy because of numerical differencing. As noted after Eq (18), the coalescent and continuous-time birth-and-death branching process models of population growth yield estimators analogous to r^(I). In fact, the estimators are the same single estimator r^(C). The models can be manipulated to yield other estimators, so the comparison of models is by no means exhaustive, but in the present context r^(C) was clearly inferior to r^(I). It even displayed a singularity with our simulated HIV data (see Fig 3), suggesting that gradual reproduction in continuous time may have its limits when modeling the lytic viral bursts of HIV.

Like r^(C), but to a lesser extent, the estimator r^(I) overestimated R₀ throughout the full range tested, 1 < R₀ < 10 (see Fig 2). Despite the overestimation, which became more severe as R₀ increased, estimates were accurate enough to be practicable for some purposes. As Fig 2 indicates, because of monotonicity of r^(I), experimental results with a large enough dataset can demonstrate a decrease in R₀. Moreover, near R₀ = 1, both the overestimation and the estimated error appeared relatively small, a useful property for detecting subtle therapeutic progress in reducing R₀. In summary, the integral estimator r^(I) is easy to compute throughout the full range 1 < R₀ < 10 tested; its bias is noticeable not excessive for all sample sizes M ≥ 32; and its error estimates are generally reliable for all sample sizes M ≥ 128 (see Fig 2 for details).

The estimator lnr^(I) in Eq (18) is linear in μ, suggesting that Eq (18) provides the first term of an expansion that our approximations have linearized around μ = 0. Other articles [1,19,24] introduced and justified the linearizing approximation of an unvarying phylogeny, given here after Eq (1). The theory following Eq (1) indicates that the approximation steadily loses accuracy as μ increases, but simulations show that it retains enough accuracy to remain practicable in parameters ranges pertinent to HIV gp120.

We now turn to biological considerations. The chief limitation of our study derives from its attempt to use a simple mathematical model capture the complex biology of HIV infection. In fact, R₀ is likely to change as infection reaches new microenvironments in the host. Naïve use of the estimates here can only produce a single effective R₀ for early infection. In future (but beyond the purview of this article), we plan to incorporate time-dependencies into the simulation of R₀, and to develop estimates that can recover some of the time-dependency.

Presently, our mathematical model assumes a single founder. The extension of mathematical modeling from a single founder to multiple founders is an important relaxation of assumptions [19]. Regardless, the single-founder assumption is often satisfied in HIV infection, because most HIV infections have a single founder. For simplicity, it also assumes a constant R₀. After initial infection, HIV traverses different host microenvironments, potentially undergoing genetic bottlenecks. On one hand, a bottleneck can bias estimates based on sequence samples, because they obscure whether a most recent common ancestor dominates early in infection (a founder) or only after the bottleneck. In the present context, therefore, bottlenecks may bias estimates away from an initial R₀ and towards R₀ for a later microenvironment. On the other hand, even multiple escape lineages do not seriously bias genetic estimates of time since infection [24], so estimates of the initial R₀ may share a similar robustness.

Estimates of R₀ may also clarify HIV biology as an infection progresses. If target cells are scarce in a microenvironment, HIV may proliferate predominantly by cell-free spread, budding from an infected target cell, entering the extracellular fluid, and infecting another target cell by chance encounter [31]. Conversely, if target cell are plentiful, e.g., in the microenvironment of a lymph node [32,33], direct cell-to-cell spread may be more efficient than cell-free spread. Cell-to-cell spread has two distinct mechanisms (and therefore can occur in qualitatively distinct environments): (1) transmission of HIV by virological synapses between adjacent target cells or (2) transmission by capture and transfer of virions between proximal macrophages and dendritic cells [31]. Viral replication may not occur during cell-to-cell transmission, so regardless of the exact mechanism, shifts between cell-free spread and cell-to-cell spread may manifest themselves as concomitant changes in R₀. However fundamental R₀ may be to describing the reproduction of pathogens, cell-to-cell spread exemplifies the difficulties in interpreting an estimate of R₀ biologically.

The route of infection determines the initial microenvironment of HIV. Most routes transmit HIV much less effectively than hematological routes [7], suggesting that their initial R₀ is typically low. Mucosal infection can promote transmission of HIV [34], however, because it can increase the local concentration of activated T cells, promoting cell-to-cell spread, and probably increasing the initial R₀. Thus, the initial R₀ may provide valuable information about initial infection.

The dominance of cell-to-cell spread over cell-free spread may vary during infection. After infecting in a cell-rich mucosal microenvironment, HIV may move through the mucosal lamina, before being transported through the lymphatic system to lymph nodes, where the target cell density in its microenvironment increases dramatically [32,33]. The values of R₀ probably vary accordingly.

The present theory may also have an important application in animal trials of viral prophylaxis, when progress towards a therapy is subtle. Indeed, the design of animal trials using high-dose challenges may have unintentionally impeded practical assessment of candidate HIV therapies, because some vaccines and prophylactics may mitigate low -⁠ but not high-dose challenges [6]. Repeated low-dose challenge studies represent an important step forward in the pre-clinical assessment, because they mimic typical HIV challenges in humans [3]. Repeated low-dose challenges probably yield infections with a single founder virus, satisfying the primary assumption of the present theory. An estimate of R₀ in this context provides a new variable for statistical analysis, beyond the binary infection status of an animal, one with direct biological relevance to the establishment of infection. Even if a vaccine or prophylactic fails to produce total sterilizing immunity, a reduction in the initial R₀ encourages further investigation of an intervention, where previously the entire line of research might have been discarded.

Trials using repeated low-dose challenges also pose some unanswered experimental questions, the most pressing being the possibility that unsuccessful challenges potentially perturb the challenged animals. Do unsuccessful challenges foster partial immunity to further challenge? Do they increase the probability of future infection? Subtle perturbations in R₀ may be much more sensitive than a binary infection status in providing the answers to such questions.

Next-generation deep sequencing can produce around 10⁵ reads of size comparable to gp120 [35,36]. Given the expectation that future experiments will likely be able to generate datasets with potentially even greater than 10⁵ sequences, any robust estimator of R₀ must be able to handle extremely large sample sizes efficiently. The consistent tightening of the error estimates and accuracy as M is increased in Eq (19) suggests that the estimator r^(I) is particularly well-suited to application in such experiments. In addition, although a complete likelihood for the entire phylogeny and mutations might be desirable in some circumstances, a maximum likelihood method for the SFS permits inference about R₀ even if the reads are short, if the reads can be placed against a reference HIV genome.

In clinical trials of HIV therapies, guidelines previously suggested starting treatment only when CD4⁺ T cell density declines below 350 cells/μl [37]. Recent studies (notably the SPARTAC trial) of temporally earlier interventions have shown increased efficacy compared to standard anti-retroviral protocols [38,39]. In some circumstances, the initial R₀ might measure the clinical efficacy of some such interventions, provide a variable predictive of their efficacy, or even help predict the clinical intervention with the greatest chance of success.

In conclusion, in the case of HIV and possibly other infectious agents, the integral approximation r^(I) provides a simple, easily computed estimate of the early basic reproduction number R₀ in a single host. The quantitative variable R₀ makes a well-characterized biological contribution to early HIV infection and should be useful assessing the efficacy of therapies in both human and animal trials.

Supporting information

S1 Fig [tif]
Plots for the maximum likelihood estimate (delta approximation) .

S2 Fig [tif]
Plots for the estimate from the method of moments.

S1 Data [docx]

Zdroje

1. Giorgi EE, Funkhouser B, Athreya G, Perelson AS, Korber BT, et al. (2010) Estimating time since infection in early homogeneous HIV-1 samples using a poisson model. BMC Bioinformatics 11.

2. Koff WC, Johnson PR, Watkins DI, Burton DR, Lifson JD, et al. (2006) HIV vaccine design: insights from live attenuated SIV vaccines. Nature Immunology 7 : 19–23. doi: 10.1038/ni1296 16357854

3. Nolen TL, Hudgens MG, Senb PK, Koch GG (2015) Analysis of repeated low-dose challenge studies. Statistics in Medicine 34 : 1981–1992. doi: 10.1002/sim.6462 25752266

4. Gordon SN, Liyanage NPM, Doster MN, Vaccari M, Vargas-Inchaustegui DA, et al. (2016) Boosting of ALVAC-SIV Vaccine-Primed Macaques with the CD4-SIVgp120 Fusion Protein Elicits Antibodies to V2 Associated with a Decreased Risk of SIVmac251 Acquisition. Journal of Immunology 197 : 2726–2737.

5. Strbo N, Vaccari M, Pahwa S, Kolber MA, Doster MN, et al. (2013) Cutting Edge: Novel Vaccination Modality Provides Significant Protection against Mucosal Infection by Highly Pathogenic Simian Immunodeficiency Virus. Journal of Immunology 190 : 2495–2499.

6. Regoes RR, Longini IM, Feinberg MB, Staprans SI (2005) Preclinical assessment of HIV vaccines and microbicides by repeated low-dose virus challenges. Plos Medicine 2 : 798–807.

7. Patel P, Borkowf CB, Brooks JT, Lasry A, Lansky A, et al. (2014) Estimating per-act HIV transmission risk: a systematic review. Aids 28 : 1509–1519. doi: 10.1097/QAD.0000000000000298 24809629

8. Fiebig EW, Wright DJ, Rawal BD, Garrett PE, Schumacher RT, et al. (2003) Dynamics of HIV viremia and antibody seroconversion in plasma donors: implications for diagnosis and staging of primary HIV infection. Aids 17 : 1871–1879. doi: 10.1097/00002030-200309050-00005 12960819

9. Kahn JO, Walker BD (1998) Acute human immunodeficiency virus type 1 infection. New England Journal of Medicine 339 : 33–39. doi: 10.1056/NEJM199807023390107 9647878

10. Salazar-Gonzalez JF, Bailes E, Pham KT, Salazar MG, Guffey MB, et al. (2008) Deciphering human immunodeficiency virus type 1 transmission and early envelope diversification by single-genome amplification and sequencing. Journal of Virology 82 : 3952–3970. doi: 10.1128/JVI.02660-07 18256145

11. Stafford MA, Corey L, Cao YZ, Daar ES, Ho DD, et al. (2000) Modeling plasma virus concentration during primary HIV infection. Journal of Theoretical Biology 203 : 285–301. doi: 10.1006/jtbi.2000.1076 10716909

12. Ribeiro RM, Qin L, Chavez LL, Li DF, Self SG, et al. (2010) Estimation of the Initial Viral Growth Rate and Basic Reproductive Number during Acute HIV-1 Infection. Journal of Virology 84 : 6096–6102. doi: 10.1128/JVI.00127-10 20357090

13. Kosaka PM, Pini V, Calleja M, Tamayo J (2017) Ultrasensitive detection of HIV-1 p24 antigen by a hybrid nanomechanical-optoplasmonic platform with potential for detecting HIV-1 at first week after infection. Plos One 12.

14. Wolinsky SM, Wike CM, Korber BTM, Hutto C, Parks WP, et al. (1992) Selective Transmission of Human-Immunodeficiency-Virus Type-1 Variants from Mothers to Infants. Science 255 : 1134–1137. doi: 10.1126/science.1546316 1546316

15. Delwart E, Magierowska M, Royz M, Foley B, Peddada L, et al. (2002) Homogeneous quasispecies in 16 out of 17 individuals during very early HIV-1 primary infection. Aids 16 : 189–195. doi: 10.1097/00002030-200201250-00007 11807302

16. Derdeyn CA, Decker JM, Bibollet-Ruche F, Mokili JL, Muldoon M, et al. (2004) Envelope-constrained neutralization-sensitive HIV-1 after heterosexual transmission. Science 303 : 2019–2022. doi: 10.1126/science.1093137 15044802

17. Keele BF, Giorgi EE, Salazar-Gonzalez JF, Decker JM, Pham KT, et al. (2008) Identification and characterisation of transmitted and early founder virus envelopes in primary HIV-1 infection. Proceedings of the National Academy of Sciences of the United States of America 105 : 7552–7557. doi: 10.1073/pnas.0802203105 18490657

18. Haaland RE, Hawkins PA, Salazar-Gonzalez J, Johnson A, Tichacek A, et al. (2009) Inflammatory Genital Infections Mitigate a Severe Genetic Bottleneck in Heterosexual Transmission of Subtype A and C HIV-1. Plos Pathogens 5.

19. Love TMT, Park SY, Giorgi EE, Mack WJ, Perelson AS, et al. (2016) SPMM: estimating infection duration of multivariant HIV-1 infections. Bioinformatics 32 : 1308–1315. doi: 10.1093/bioinformatics/btv749 26722117

20. Spouge JL (2019) An accurate approximation for the expected site frequency spectrum in a Galton-Watson process under an infinite sites mutation model. Theor Popul Biol 12 : 30151–30155.

21. Bellman R, Harris TE (1948) On the Theory of Age-Dependent Stochastic Branching Processes. Proceedings of the National Academy of Sciences of the United States of America 34 : 601–604. doi: 10.1073/pnas.34.12.601 16588841

22. Knuth DE (1992) 2 Notes on Notation. American Mathematical Monthly 99 : 403–422.

23. Kimura M (1969) Number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. Genetics 61 : 893–903. 5364968

24. Park SY, Love TMT, Perelson AS, Mack WJ, Lee HY (2016) Molecular clock of HIV-1 envelope genes under early immune selection. Retrovirology 13.

25. Fu YX (1995) Statistical properties of segregating sites. Theoretical Population Biology 48 : 172–197. doi: 10.1006/tpbi.1995.1025 7482370

26. Athreya KB, Ney PE (2004) Branching Processes. Mineola, New York: Dover.

27. Wakeley J (2009) Coalescent Theory. Greenwood Village CO: Roberts and Company.

28. Graham RL, Knuth DE, Ptashnik O (1994) Concrete mathematics: a foundation for computer science. New York: Addison-Wesley.

29. Markowitz M, Louie M, Hurley A, Sun E, Di Mascio M (2003) A novel antiviral intervention results in more accurate assessment of human immunodeficiency virus type 1 replication dynamics and T-Cell decay in vivo. Journal of Virology 77 : 5037–5038. doi: 10.1128/JVI.77.8.5037-5038.2003 12663814

30. Lee HY, Giorgi EE, Keele BF, Gaschen B, Athreya GS, et al. (2009) Modeling sequence evolution in acute HIV-1 infection. Journal of Theoretical Biology 261 : 341–360. doi: 10.1016/j.jtbi.2009.07.038 19660475

31. Zhang CW, Zhou S, Groppelli E, Pellegrino P, Williams I, et al. (2015) Hybrid Spreading Mechanisms and T Cell Activation Shape the Dynamics of HIV-1 Infection. Plos Computational Biology 11.

32. Layne SP, Merges MJ, Dembo M, Spouge JL, Nara PL (1990) {HIV} requires multiple gp120 molecules for {CD4}-mediated infection. Nature 346 : 277–279. doi: 10.1038/346277a0 2374593

33. Spouge JL (1994) Viral multiplicity of attachment and its implications for human-immunodeficiency-virus therapies. Journal of Virology 68 : 1782–1789. 8107240

34. Ward H, Rönn M (2010) The contribution of STIs to the sexual transmission of HIV. Current opinion in HIV and AIDS 5 : 305–310. doi: 10.1097/COH.0b013e32833a8844 20543605

35. Carneiro MO, Russ C, Ross MG, Gabriel SB, Nusbaum C, et al. (2012) Pacific biosciences sequencing technology for genotyping and variation discovery in human data. BMC Genomics 13 : 375. doi: 10.1186/1471-2164-13-375 22863213

36. McElroy K, Thomas T, Luciani F (2014) Deep sequencing of evolving pathogen populations: applications, errors, and bioinformatic solutions. Microbial Informatics and Experimentation 4 : 1. doi: 10.1186/2042-5783-4-1 24428920

37. Moore RD, Keruly JC (2007) CD4(+) cell count 6 years after commencement of highly active antiretroviral therapy in persons with sustained virologic suppression. Clinical Infectious Diseases 44 : 441–446. doi: 10.1086/510746 17205456

38. Fidler (2013) Short-Course Antiretroviral Therapy in Primary HIV Infection. The New England journal of medicine 368 : 207–217. doi: 10.1056/NEJMoa1110039 23323897

39. Sáez-Cirión A, Bacchus C, Hocqueloux L, Avettand-Fenoel V, Girault I, et al. (2013) Post-Treatment HIV-1 Controllers with a Long-Term Virological Remission after the Interruption of Early Initiated Antiretroviral Therapy ANRS VISCONTI Study. PLOS Pathogens 9: e1003211. doi: 10.1371/journal.ppat.1003211 23516360

Top novinky

Nové kurzy

Top články

Nové číslo

Top novinky

Nejčtenější

Nová videa

Nová videa

Nové podcasty

Doporučené pozice

Top novinky

Estimating the basic reproduction number of a pathogen in a single host when only a single founder successfully infects

Summary

Keywords:

Introduction

Theory

Methods

Results

Discussion

Supporting information

Zdroje

PLOS One

Svět praktické medicíny 4/2025 (znalostní test z časopisu)

Denzitometrie v praxi: od kvalitního snímku po správnou interpretaci

Eozinofilie – multioborová otázka?

Čelistně-ortodontické kazuistiky od A do Z

Cesta od prvních příznaků RS k optimální léčbě