Viral phylodynamics

Viral phylodynamics is defined as the study of how epidemiological, immunological, and evolutionary processes act and potentially interact to shape viral phylogenies.[1] Since the coining of the term in 2004, research on viral phylodynamics has focused on transmission dynamics in an effort to shed light on how these dynamics impact viral genetic variation. Transmission dynamics can be considered at the level of cells within an infected host, individual hosts within a population, or entire populations of hosts.

Many viruses, especially RNA viruses, rapidly accumulate genetic variation because of short generation times and high mutation rates. Patterns of viral genetic variation are therefore heavily influenced by how quickly transmission occurs and by which entities transmit to one another. Patterns of viral genetic variation will also be affected by selection acting on viral phenotypes. Although viruses can differ with respect to many phenotypes, phylodynamic studies have to date tended to focus on a limited number of viral phenotypes. These include virulence phenotypes, phenotypes associated with viral transmissibility, cell or tissue tropism phenotypes, and antigenic phenotypes that can facilitate escape from host immunity. Due to the impact that transmission dynamics and selection can have on viral genetic variation, viral phylogenies can therefore be used to investigate important epidemiological, immunological, and evolutionary processes, such as epidemic spread,[2] spatio-temporal dynamics including metapopulation dynamics,[3] zoonotic transmission, tissue tropism,[4] and antigenic drift.[5] The quantitative investigation of these processes through the consideration of viral phylogenies is the central aim of viral phylodynamics.

Sources of phylodynamic variation

Idealized caricatures of virus phylogenies that distinguish between virus population with (A) exponential growth (B) or constant size.
Idealized caricatures of virus phylogenies that show the effects of population subdivision, distinguishing between a structured host population (A) and an unstructured host population (B). Red and blue circles represent spatial locations from which viral samples were isolated.
Idealized caricatures of virus phylogenies that show the effects of immune escape where selection results in an unbalanced tree (A) and neutral dynamics results in a balance tree (B).

In coining the term phylodynamics, Grenfell and coauthors[1] postulated that viral phylogenies "... are determined by a combination of immune selection, changes in viral population size, and spatial dynamics". Their study showcased three features of viral phylogenies, which may serve as rules of thumb for identifying important epidemiological, immunological, and evolutionary processes influencing patterns of viral genetic variation.

The relative lengths of internal versus external branches will be affected by changes in viral population size over time[1]
Rapid expansion of a virus in a population will be reflected by a "star-like" tree, in which external branches are long relative to internal branches. Star-like trees arise because viruses are more likely to share a recent common ancestor when the population is small, and a growing population has an increasingly smaller population size towards the past. Compared to a phylogeny of an expanding virus, a phylogeny of a viral population that stays constant in size will have external branches that are shorter relative to branches on the interior of the tree. The phylogeny of HIV provides a good example of a star-like tree, as the prevalence of HIV infection rose rapidly throughout the 1980s (exponential growth). The phylogeny of hepatitis B virus instead reflects a viral population that has remained roughly consistent (constant size). Similarly, trees reconstructed from viral sequences isolated from chronically infected individuals can be used to gauge changes in viral population sizes within a host.
The clustering of taxa on a viral phylogeny will be affected by host population structure[1]
Viruses within similar hosts, such as hosts that reside in the same geographic region, are expected to be more closely related genetically if transmission occurs more commonly between them. The phylogenies of measles and rabies virus illustrate viruses with spatially structured host population. These phylogenies stand in contrast to the phylogeny of human influenza, which does not appear to exhibit strong spatial structure over extended periods of time. Clustering of taxa, when it occurs, is not necessarily observed at all scales, and a population that appears structured at some scale may appear panmictic at another scale, for example at a smaller spatial scale. While spatial structure is the most commonly observed population structure in phylodynamic analyses, viruses may also have nonrandom admixture by attributes such as the age, race, and risk behavior.[6] This is because viral transmission can preferentially occur between hosts sharing any of these attributes.
Tree balance will be affected by selection, most notably immune escape[1]
The effect of directional selection on the shape of a viral phylogeny is exemplified by contrasting the trees of influenza virus and HIV's surface proteins. The ladder-like phylogeny of influenza virus A/H3N2's hemagglutinin protein bears the hallmarks of strong directional selection, driven by immune escape (imbalanced tree). In contrast, a more balanced phylogeny may occur when a virus is not subject to strong immune selection or other source of directional selection. An example of this is the phylogeny of the HIV envelope protein inferred from sequences isolated from different individuals in a population (balanced tree). phylogenies of the HIVf envelope protein from chronically infected hosts resemble influenza's ladder-like tree. This highlights that the processes affecting viral genetic variation can differ across scales. Indeed, contrasting patterns of viral genetic variation within and between hosts has been an active topic in phylodynamic research since the field's inception.[1]

Although these three phylogenetic features are useful rules of thumb to identify epidemiological, immunological, and evolutionary processes that might be impacting viral genetic variation, there is growing recognition that the mapping between process and phylogenetic pattern can be many-to-one. For instance, although ladder-like trees could reflect the presence of directional selection, ladder-like trees could also reflect sequential genetic bottlenecks that might occur with rapid spatial spread, as in the case of rabies virus.[7] Because of this many-to-one mapping between process and phylogenetic pattern, research in the field of viral phylodynamics has sought to develop and apply quantitative methods to effectively infer process from reconstructed viral phylogenies (see Methods). The consideration of other data sources (e.g., incidence patterns) may aid in distinguishing between competing phylodynamic hypotheses. Combining disparate sources of data for phylodynamic analysis remains a major challenge in the field and is an active area of research.

Applications

Viral origins

Phylodynamic models may aid in dating epidemic and pandemic origins. The rapid rate of evolution in viruses allows molecular clock models to be estimated from genetic sequences, thus providing a per-year rate of evolution of the virus. With the rate of evolution measured in real units of time, it is possible to infer the date of the most recent common ancestor (MRCA) for a set of viral sequences. The age of the MRCA of these isolates is a lower bound; the common ancestor of the entire virus population must have existed earlier than the MRCA of the virus sample. In April 2009, genetic analysis of 11 sequences of swine-origin H1N1 influenza suggested that the common ancestor existed at or before 12 January 2009.[8] This finding aided in making an early estimate of the basic reproduction number ${\displaystyle R_{0}}$ of the pandemic. Similarly, genetic analysis of sequences isolated from within an individual can be used to determine the individual's infection time.[9]

Phylodynamic models may provide insight into epidemiological parameters that are difficult to assess through traditional surveillance means. For example, assessment of ${\displaystyle R_{0}}$ from surveillance data requires careful control of the variation of the reporting rate and the intensity of surveillance. Inferring the demographic history of the virus population from genetic data may help to avoid these difficulties and can provide a separate avenue for inference of ${\displaystyle R_{0}}$.[2] Such approaches have been used to estimate ${\displaystyle R_{0}}$ in hepatitis C virus[10] and HIV.[2] Additionally, differential transmission between groups, be they geographic-, age-, or risk-related, is very difficult to assess from surveillance data alone. Phylogeographic models have the possibility of more directly revealing these otherwise hidden transmission patterns.[11] Phylodynamic approaches have mapped the geographic movement of the human influenza virus[3] and quantified the epidemic spread of rabies virus in North American raccoons.[12][13] However, nonrepresentative sampling may bias inferences of both ${\displaystyle R_{0}}$[14] and migration patterns.[3] Phylodynamic approaches have also been used to better understand viral transmission dynamics and spread within infected hosts. For example, phylodynamic studies have been used to infer the rate of viral growth within infected hosts and to argue for the occurrence of viral compartmentalization in hepatitis C infection.[4]

Viral control efforts

Phylodynamic approaches can also be useful in ascertaining the effectiveness of viral control efforts, particularly for diseases with low reporting rates. For example, the genetic diversity of the DNA-based hepatitis B virus declined in the Netherlands in the late 1990s, following the initiation of a vaccination program.[15] This correlation was used to argue that vaccination was effective at reducing the prevalence of infection, although alternative explanations are possible.[16]

Viral control efforts can also impact the rate at which virus populations evolve, thereby influencing phylogenetic patterns. Phylodynamic approaches that quantify how evolutionary rates change over time can therefore provide insight into the effectiveness of control strategies. For example, an application to HIV sequences within infected hosts showed that viral substitution rates dropped to effectively zero following the initiation of antiretroviral drug therapy.[17] This decrease in substitution rates was interpreted as an effective cessation of viral replication following the commencement of treatment, and would be expected to lead to lower viral loads. This finding is especially encouraging because lower substitution rates are associated with slower progression to AIDS in treatment-naive patients.[18]

Antiviral treatment also creates selective pressure for the evolution of drug resistance in virus populations, and can thereby affect patterns of genetic diversity. Commonly, there is a fitness trade-off between faster replication of susceptible strains in the absence of antiviral treatment and faster replication of resistant strains in the presence of antivirals.[19] Thus, ascertaining the level of antiviral pressure necessary to shift evolutionary outcomes is of public health importance. Phylodynamic approaches have been used to examine the spread of Oseltamivir resistance in influenza A/H1N1.[20]

Methods

Most often, the goal of phylodynamic analyses is to make inferences of epidemiological processes from viral phylogenies. Thus, most phylodynamic analyses begin with the reconstruction of a phylogenetic tree. Genetic sequences are often sampled at multiple time points, which allows the estimation of substitution rates and the time of the MRCA using a molecular clock model.[21] For viruses, Bayesian phylogenetic methods are popular because of the ability to fit complex demographic scenarios while integrating out phylogenetic uncertainty.[22][23]

Traditional evolutionary approaches directly utilize methods from computational phylogenetics and population genetics to assess hypotheses of selection and population structure without direct regard for epidemiological models. For example,

• the magnitude of selection can be measured by comparing the rate of nonsynonymous substitution to the rate of synonymous substitution (dN/dS);
• the population structure of the host population may be examined by calculation of F-statistics; and
• hypotheses concerning panmixis and selective neutrality of the virus may be tested with statistics such as Tajima's D.

However, such analyses were not designed with epidemiological inference in mind and it may be difficult to extrapolate from standard statistics to desired epidemiological quantities.

In an effort to bridge the gap between traditional evolutionary approaches and epidemiological models, several analytical methods have been developed to specifically address problems related to phylodynamics. These methods are based on coalescent theory, birth-death models,[24] and simulation, and are used to more directly relate epidemiological parameters to observed viral sequences.

Coalescent theory and phylodynamics

Effective population size

The coalescent is a mathematical model that describes the ancestry of a sample of nonrecombining gene copies. In modeling the coalescent process, time is usually considered to flow backwards from the present. In a selectively neutral population of constant size ${\displaystyle N}$ and nonoverlapping generations (the Wright Fisher model), the expected time for a sample of two gene copies to coalesce (i.e., find a common ancestor) is ${\displaystyle N}$ generations. More generally, the waiting time for two members of a sample of ${\displaystyle n}$ gene copies to share a common ancestor is exponentially distributed, with rate

${\displaystyle \lambda _{n}={n \choose 2}{\frac {1}{N}}}$.

This time interval is labeled ${\displaystyle T_{n}}$, and at its end there are ${\displaystyle n-1}$ extant lineages remaining. These remaining lineages will coalesce at the rate ${\displaystyle \lambda _{n-1}\cdots \lambda _{2}}$ after intervals ${\displaystyle T_{n-1}\cdots T_{2}}$. This process can be simulated by drawing exponential random variables with rates ${\displaystyle \{\lambda _{n-i}\}_{i=0,\cdots ,n-2}}$ until there is only a single lineage remaining (the MRCA of the sample). In the absence of selection and population structure, the tree topology may be simulated by picking two lineages uniformly at random after each coalescent interval ${\displaystyle T_{i}}$.

A gene genealogy illustrating internode intervals.

The expected waiting time to find the MRCA of the sample is the sum of the expected values of the internode intervals,

{\displaystyle {\begin{aligned}\mathrm {E} [\mathrm {TMRCA} ]&=\mathrm {E} [T_{n}]+\mathrm {E} [T_{n-1}]+\cdots +\mathrm {E} [T_{2}]\\&=1/\lambda _{n}+1/\lambda _{n-1}+\cdots +1/\lambda _{2}\\&=2N(1-{\frac {1}{n}}).\end{aligned}}}

Two corollaries are :

• The time to the MRCA (TMRCA) of a sample is not unbounded in the sample size. ${\displaystyle \lim _{n\rightarrow \infty }\mathrm {E} [\mathrm {TMRCA} ]=2N.}$
• Few samples are required for the expected TMRCA of the sample to be close to the theoretical upper bound, as the difference is ${\displaystyle O(1/n)}$.

Consequently, the TMRCA estimated from a relatively small sample of viral genetic sequences is an asymptotically unbiased estimate for the time that the viral population was founded in the host population.

For example, Robbins et al.[25] estimated the TMRCA for 74 HIV-1 subtype-B genetic sequences collected in North America to be 1968. Assuming a constant population size, we expect the time back to 1968 to represent ${\displaystyle 1-1/74=99\%}$ of the TMRCA of the North American virus population.

If the population size ${\displaystyle N(t)}$ changes over time, the coalescent rate ${\displaystyle \lambda _{n}(t)}$ will also be a function of time. Donnelley and Tavaré[26] derived this rate for a time-varying population size under the assumption of constant birth rates:

${\displaystyle \lambda _{n}(t)={n \choose 2}{\frac {1}{N(t)}}}$.

Because all topologies are equally likely under the neutral coalescent, this model will have the same properties as the constant-size coalescent under a rescaling of the time variable: ${\displaystyle t\rightarrow \int _{\tau =0}^{t}{\frac {\mathrm {d} \tau }{N(\tau )}}}$.

Very early in an epidemic, the virus population may be growing exponentially at rate ${\displaystyle r}$, so that ${\displaystyle t}$ units of time in the past, the population will have size ${\displaystyle N(t)=N_{0}e^{-rt}}$. In this case, the rate of coalescence becomes

${\displaystyle \lambda _{n}(t)={n \choose 2}{\frac {1}{N_{0}e^{-rt}}}}$.

This rate is small close to when the sample was collected (${\displaystyle t=0}$), so that external branches (those without descendants) of a gene genealogy will tend to be long relative to those close to the root of the tree. This is why rapidly growing populations yield trees with long tip branches.

If the rate of exponential growth is estimated from a gene genealogy, it may be combined with knowledge of the duration of infection or the serial interval ${\displaystyle D}$ for a particular pathogen to estimate the basic reproduction number, ${\displaystyle R_{0}}$. The two may be linked by the following equation:[27]

${\displaystyle r={\frac {R_{0}-1}{D}}}$.

For example, one of the first estimates of ${\displaystyle R_{0}}$ was for pandemic H1N1 influenza in 2009 by using a coalescent-based analysis of 11 hemagglutinin sequences in combination with prior data about the infectious period for influenza.[8]

Compartmental models

Infectious disease epidemics are often characterized by highly nonlinear and rapid changes in the number of infected individuals and the effective population size of the virus. In such cases, birth rates are highly variable, which can diminish the correspondence between effective population size and the prevalence of infection.[28] Many mathematical models have been developed in the field of mathematical epidemiology to describe the nonlinear time series of prevalence of infection and the number of susceptible hosts. A well studied example is the Susceptible-Infected-Recovered (SIR) system of differential equations, which describes the fractions of the population ${\displaystyle S(t)}$ susceptible, ${\displaystyle I(t)}$ infected, and ${\displaystyle R(t)}$ recovered as a function of time:

${\displaystyle {\frac {dS}{dt}}=-\beta SI}$,
${\displaystyle {\frac {dI}{dt}}=\beta SI-\gamma I}$, and
${\displaystyle {\frac {dR}{dt}}=\gamma I}$.

Here, ${\displaystyle \beta }$ is the per capita rate of transmission to susceptible hosts, and ${\displaystyle \gamma }$ is the rate at which infected individuals recover, whereupon they are no longer infectious. In this case, the incidence of new infections per unit time is ${\displaystyle f(t)=\beta SI}$, which is analogous to the birth rate in classical population genetics models. The general formula for the rate of coalescence is:[2]

${\displaystyle \lambda _{n}(t)={n \choose 2}{\frac {2f(t)}{I(t)^{2}}}}$.

The ratio ${\displaystyle 2{n \choose 2}/{I(t)^{2}}}$ can be understood as arising from the probability that two lineages selected uniformly at random are both ancestral to the sample. This probability is the ratio of the number of ways to pick two lineages without replacement from the set of lineages and from the set of all infections: ${\displaystyle {n \choose 2}/{I(t) \choose 2}\approx 2{n \choose 2}/{I(t)^{2}}}$. Coalescent events will occur with this probability at the rate given by the incidence function ${\displaystyle f(t)}$.

For the simple SIR model, this yields

${\displaystyle \lambda _{n}(t)={n \choose 2}{\frac {2\beta S(t)}{I(t)}}}$.

This expression is similar to the Kingman coalescent rate, but is damped by the fraction susceptible ${\displaystyle S(t)}$.

Early in an epidemic, ${\displaystyle S(0)\approx 1}$, so for the SIR model

${\displaystyle \lambda _{n}(t)\approx {n \choose 2}{\frac {2\beta }{I(t)}}}$.

This has the same mathematical form as the rate in the Kingman coalescent, substituting ${\displaystyle N_{e}=I(t)/2\beta }$. Consequently, estimates of effective population size based on the Kingman coalescent will be proportional to prevalence of infection during the early period of exponential growth of the epidemic.[28]

When a disease is no longer exponentially growing but has become endemic, the rate of lineage coalescence can also be derived for the epidemiological model governing the disease's transmission dynamics. This can be done by extending the Wright Fisher model to allow for unequal offspring distributions. With a Wright Fisher generation taking ${\displaystyle \tau }$ units of time, the rate of coalescence is given by:

${\displaystyle \lambda _{n}={n \choose 2}{\frac {1}{N_{e}\tau }}}$,

where the effective population size ${\displaystyle N_{e}}$ is the population size ${\displaystyle N}$ divided by the variance of the offspring distribution ${\displaystyle \sigma ^{2}}$.[29] The generation time ${\displaystyle \tau }$ for an epidemiological model at equilibrium is given by the duration of infection and the population size ${\displaystyle N}$ is closely related to the equilibrium number of infected individuals. To derive the variance in the offspring distribution ${\displaystyle \sigma ^{2}}$ for a given epidemiological model, one can imagine that infected individuals can differ from one another in their infectivities, their contact rates, their durations of infection, or in other characteristics relating to their ability to transmit the virus with which they are infected. These differences can be acknowledged by assuming that the basic reproduction number is a random variable ${\displaystyle \nu }$ that varies across individuals in the population and that ${\displaystyle \nu }$ follows some continuous probability distribution.[30] The mean and variance of these individual basic reproduction numbers, ${\displaystyle \mathrm {E} [\nu ]}$ and ${\displaystyle \mathrm {Var} [\nu ]}$, respectively, can then be used to compute ${\displaystyle \sigma ^{2}}$. The expression relating these quantities is given by:[31]

${\displaystyle \sigma ^{2}={\frac {\mathrm {Var} [\nu ]}{\mathrm {E} [\nu ]^{2}}}+1}$.

For example, for the SIR model above, modified to include births into the population and deaths out of the population, the population size ${\displaystyle N}$ is given by the equilibrium number of infected individuals, ${\displaystyle I}$. The mean basic reproduction number, averaged across all infected individuals, is given by ${\displaystyle \beta /\gamma }$, under the assumption that the background mortality rate is negligible compared to the rate of recovery ${\displaystyle \gamma }$. The variance in individuals' basic reproduction rates is given by ${\displaystyle (\beta /\gamma )^{2}}$, because the duration of time individuals remain infected in the SIR model is exponentially distributed. The variance in the offspring distribution ${\displaystyle \sigma ^{2}}$ is therefore 2. ${\displaystyle N_{e}}$ therefore becomes ${\displaystyle {\frac {I}{2}}}$ and the rate of coalescence becomes:

${\displaystyle \lambda _{n}={n \choose 2}{\frac {2\gamma }{I}}}$.

This rate, derived for the SIR model at equilibrium, is equivalent to the rate of coalescence given by the more general formula.[2] Rates of coalescence can similarly be derived for epidemiological models with superspreaders or other transmission heterogeneities, for models with individuals who are exposed but not yet infectious, and for models with variable infectious periods, among others.[31] Given some epidemiological information (such as the duration of infection) and a specification of a mathematical model, viral phylogenies can therefore be used to estimate epidemiological parameters that might otherwise be difficult to quantify.

Phylogeography

At the most basic level, the presence of geographic population structure can be revealed by comparing the genetic relatedness of viral isolates to geographic relatedness. A basic question is whether geographic character labels are more clustered on a phylogeny than expected under a simple nonstructured model. This question can be answered by counting the number of geographic transitions on the phylogeny via parsimony, maximum likelihood or through Bayesian inference. If population structure exists, then there will be fewer geographic transitions on the phylogeny than expected in a panmictic model.[32] This hypothesis can be tested by randomly scrambling the character labels on the tips of the phylogeny and counting the number of geographic transitions present in the scrambled data. By repeatedly scrambling the data and calculating transition counts, a null distribution can be constructed and a p-value computed by comparing the observed transition counts to this null distribution.[32]

Beyond the presence or absence of population structure, phylodynamic methods can be used to infer the rates of movement of viral lineages between geographic locations and reconstruct the geographic locations of ancestral lineages. Here, geographic location is treated as a phylogenetic character state, similar in spirit to 'A', 'T', 'G', 'C', so that geographic location is encoded as a substitution model. The same phylogenetic machinery that is used to infer models of DNA evolution can thus be used to infer geographic transition matrices.[33] The end result is a rate, measured in terms of years or in terms of nucleotide substitutions per site, that a lineage in one region moves to another region over the course of the phylogenetic tree. In a geographic transmission network, some regions may mix more readily and other regions may be more isolated. Additionally, some transmission connections may be asymmetric, so that the rate at which lineages in region 'A' move to region 'B' may differ from the rate at which lineages in 'B' move to 'A'. With geographic location thus encoded, ancestral state reconstruction can be used to infer ancestral geographic locations of particular nodes in the phylogeny.[33] These types of approaches can be extended by substituting other attributes for geographic locations. For example, in an application to rabies virus, Streicker and colleagues estimated rates of cross-species transmission by considering host species as the attribute.[7]

Simulation

As discussed above, it is possible to directly infer parameters of simple compartmental epidemiological models, such as SIR models, from sequence data by looking at genealogical patterns. Additionally, general patterns of geographic movement can be inferred from sequence data, but these inferences do not involve an explicit model of transmission dynamics between infected individuals. For more complicated epidemiological models, such as those involving cross-immunity, age structure of host contact rates, seasonality, or multiple host populations with different life history traits, it is often impossible to analytically predict genealogical patterns from epidemiological parameters. As such, the traditional statistical inference machinery will not work with these more complicated models, and in this case, it is common to instead use a forward simulation-based approach.

Simulation-based models require specification of a transmission model for the infection process between infected hosts and susceptible hosts and for the recovery process of infected hosts. Simulation-based models may be compartmental, tracking the numbers of hosts infected and recovered to different viral strains,[34] or may be individual-based, tracking the infection state and immune history of every host in the population.[5][35] Generally, compartmental models offer significant advantages in terms of speed and memory usage, but may be difficult to implement for complex evolutionary or epidemiological scenarios. A forward simulation model may account for geographic population structure or age structure by modulating transmission rates between host individuals of different geographic or age classes. Additionally, seasonality may be incorporated by allowing time of year to influence transmission rate in a stepwise or sinusoidal fashion.

To connect the epidemiological model to viral genealogies requires that multiple viral strains, with different nucleotide or amino acid sequences, exist in the simulation, often denoted ${\displaystyle I_{1}\cdots I_{n}}$ for different infected classes. In this case, mutation acts to convert a host in one infected class to another infected class. Over the course of the simulation, viruses mutate and sequences are produced, from which phylogenies may be constructed and analyzed.

For antigenically variable viruses, it becomes crucial to model the risk of transmission from an individual infected with virus strain 'A' to an individual who has previously been infected with virus strains 'B', 'C', etc... The level of protection against one strain of virus by a second strain is known as cross-immunity. In addition to risk of infection, cross-immunity may modulate the probability that a host becomes infectious and the duration that a host remains infectious.[36] Often, the degree of cross-immunity between virus strains is assumed to be related to their sequence distance.

In general, in needing to run simulations rather than compute likelihoods, it may be difficult to make fine-scale inferences on epidemiological parameters, and instead, this work usually focuses on broader questions, testing whether overall genealogical patterns are consistent with one epidemiological model or another. Additionally, simulation-based methods are often used to validate inference results, providing test data where the correct answer is known ahead of time. Because computing likelihoods for genealogical data under complex simulation models has proven difficult, an alternative statistical approach called Approximate Bayesian Computation (ABC) is becoming popular in fitting these simulation models to patterns of genetic variation, following successful application of this approach to bacterial diseases.[37][38][39] This is because ABC makes use of easily computable summary statistics to approximate likelihoods, rather than the likelihoods themselves.

Examples

Phylodynamics of influenza

Phylogenetic tree of the HA1 region of the HA gene of influenza A (H3N2) from viruses sampled between 1968 and 2002.

Human influenza is an acute respiratory infection primarily caused by viruses influenza A and influenza B. Influenza A viruses can be further classified into subtypes, such as A/H1N1 and A/H3N2. Here, subtypes are denoted according to their hemagglutinin (H or HA) and neuraminidase (N or NA) genes, which as surface proteins, act as the primary targets for the humoral immune response. Influenza viruses circulate in other species as well, most notably as swine influenza and avian influenza. Through reassortment, genetic sequences from swine and avian influenza occasionally enter the human population. If a particular hemagglutinin or neuraminidase has been circulating outside the human population, then humans will lack immunity to this protein and an influenza pandemic may follow a host switch event, as seen in 1918, 1957, 1968 and 2009. After introduction into the human population, a lineage of influenza generally persists through antigenic drift, in which HA and NA continually accumulate mutations allowing viruses to infect hosts immune to earlier forms of the virus. These lineages of influenza show recurrent seasonal epidemics in temperate regions and less periodic transmission in the tropics. Generally, at each pandemic event, the new form of the virus outcompetes existing lineages.[35] The study of viral phylodynamics in influenza primarily focuses on the continual circulation and evolution of epidemic influenza, rather than on pandemic emergence. Of central interest to the study of viral phylodynamics is the distinctive phylogenetic tree of epidemic influenza A/H3N2, which shows a single predominant trunk lineage that persists through time and side branches that persist for only 1–5 years before going extinct.[40]

Selective pressures

Phylodynamic techniques have provided insight into the relative selective effects of mutations to different sites and different genes across the influenza virus genome. The exposed location of hemagglutinin (HA) suggests that there should exist strong selective pressure for evolution to the specific sites on HA that are recognized by antibodies in the human immune system. These sites are referred to as epitope sites. Phylogenetic analysis of H3N2 influenza has shown that putative epitope sites of the HA protein evolve approximately 3.5 times faster on the trunk of the phylogeny than on side branches.[41][42] This suggests that viruses possessing mutations to these exposed sites benefit from positive selection and are more likely than viruses lacking such mutations to take over the influenza population. Conversely, putative nonepitope sites of the HA protein evolve approximately twice as fast on side branches than on the trunk of the H3 phylogeny,[41][42] indicating that mutations to these sites are selected against and viruses possessing such mutations are less likely to take over the influenza population. Thus, analysis of phylogenetic patterns gives insight into underlying selective forces. A similar analysis combining sites across genes shows that while both HA and NA undergo substantial positive selection, internal genes show low rates of amino acid fixation relative to levels of polymorphism, suggesting an absence of positive selection.[43]

Further analysis of HA has shown it to have a very small effective population size relative to the census size of the virus population, as expected for a gene undergoing strong positive selection.[44] However, across the influenza genome, there is surprisingly little variation in effective population size; all genes are nearly equally low.[45] This finding suggests that reassortment between segments occurs slowly enough, relative to the actions of positive selection, that genetic hitchhiking causes beneficial mutations in HA and NA to reduce diversity in linked neutral variation in other segments of the genome.

Influenza A/H1N1 shows a larger effective population size and greater genetic diversity than influenza H3N2,[45] suggesting that H1N1 undergoes less adaptive evolution than H3N2. This hypothesis is supported by empirical patterns of antigenic evolution; there have been nine vaccine updates recommended by the WHO for H1N1 in the interpandemic period between 1978 and 2009, while there have been 20 vaccine updates recommended for H3N2 during this same time period.[46] Additionally, an analysis of patterns of sequence evolution on trunk and side branches suggests that H1N1 undergoes substantially less positive selection than H3N2.[42][43] However, the underlying evolutionary or epidemiological cause for this difference between H3N2 and H1N1 remains unclear.

Circulation patterns

The extremely rapid turnover of the influenza population means that the rate of geographic spread of influenza lineages must also, to some extent, be rapid. Surveillance data show a clear pattern of strong seasonal epidemics in temperate regions and less periodic epidemics in the tropics.[47] The geographic origin of seasonal epidemics in the Northern and Southern Hemispheres had been a major open question in the field. However, temperate epidemics usually emerge from a global reservoir rather than emerging from within the previous season's genetic diversity.[45][48] This and subsequent work, has suggested that the global persistence of the influenza population is driven by viruses being passed from epidemic to epidemic, with no individual region in the world showing continual persistence.[3][49] However, there is considerable debate regarding the particular configuration of the global network of influenza, with one hypothesis suggesting a metapopulation in East and Southeast Asia that continually seeds influenza in the rest of the world,[48] and another hypothesis advocating a more global metapopulation in which temperate lineages often return to the tropics at the end of a seasonal epidemic.[3][49]

All of these phylogeographic studies necessarily suffer from limitations in the worldwide sampling of influenza viruses. For example, the relative importance of tropical Africa and India has yet to be uncovered. Additionally, the phylogeographic methods used in these studies (see section on phylogeographic methods) make inferences of the ancestral locations and migration rates on only the samples at hand, rather than on the population in which these samples are embedded. Because of this, study-specific sampling procedures are a concern in extrapolating to population-level inferences. However, estimates of migration rates that are jointly based on epidemiological and evolutionary simulations appear robust to a large degree of undersampling or oversampling of a particular region.[3] Further methodological progress is required to more fully address these issues.

Simulation-based models

Forward simulation-based approaches for addressing how immune selection can shape the phylogeny of influenza A/H3N2's hemagglutinin protein have been actively developed by disease modelers since the early 2000s. These approaches include both compartmental models and agent-based models. One of the first compartmental models for influenza was developed by Gog and Grenfell,[34] who simulated the dynamics of many strains with partial cross-immunity to one another. Under a parameterization of long host lifespan and short infectious period, they found that strains would form self-organized sets that would emerge and replace one another. Although the authors did not reconstruct a phylogeny from their simulated results, the dynamics they found were consistent with a ladder-like viral phylogeny exhibiting low strain diversity and rapid lineage turnover.

Later work by Ferguson and colleagues[35] adopted an agent-based approach to better identify the immunological and ecological determinants of influenza evolution. The authors modeled influenza's hemagglutinin as four epitopes, each consisting of three amino acids. They showed that under strain-specific immunity alone (with partial cross-immunity between strains based on their amino acid similarity), the phylogeny of influenza A/H3N2's HA was expected to exhibit 'explosive genetic diversity', a pattern that is inconsistent with empirical data. This led the authors to postulate the existence of a temporary strain-transcending immunity: individuals were immune to reinfection with any other influenza strain for approximately six months following an infection. With this assumption, the agent-based model could reproduce the ladder-like phylogeny of influenza A/H3N2's HA protein.

Work by Koelle and colleagues[5] revisited the dynamics of influenza A/H3N2 evolution following the publication of a paper by Smith and colleagues[50] which showed that the antigenic evolution of the virus occurred in a punctuated manner. The phylodynamic model designed by Koelle and coauthors argued that this pattern reflected a many-to-one genotype-to-phenotype mapping, with the possibility of strains from antigenically distinct clusters of influenza sharing a high degree of genetic similarity. Through incorporating this mapping of viral genotype into viral phenotype (or antigenic cluster) into their model, the authors were able to reproduce the ladder-like phylogeny of influenza's HA protein without generalized strain-transcending immunity. The reproduction of the ladder-like phylogeny resulted from the viral population passing through repeated selective sweeps. These sweeps were driven by herd immunity and acted to constrain viral genetic diversity.

Instead of modeling the genotypes of viral strains, a compartmental simulation model by Gökaydin and colleagues[51] considered influenza evolution at the scale of antigenic clusters (or phenotypes). This model showed that antigenic emergence and replacement could result under certain epidemiological conditions. These antigenic dynamics would be consistent with a ladder-like phylogeny of influenza exhibiting low genetic diversity and continual strain turnover.

In recent work, Bedford and colleagues[52] used an agent-based model to show that evolution in a Euclidean antigenic space can account for the phylogenetic pattern of influenza A/H3N2's HA, as well as the virus's antigenic, epidemiological, and geographic patterns. The model showed the reproduction of influenza's ladder-like phylogeny depended critically on the mutation rate of the virus as well as the immunological distance yielded by each mutation.

The phylodynamic diversity of influenza

Although most research on the phylodynamics of influenza has focused on seasonal influenza A/H3N2 in humans, influenza viruses exhibit a wide variety of phylogenetic patterns. Qualitatively similar to the phylogeny of influenza A/H3N2's hemagglutinin protein, influenza A/H1N1 exhibits a ladder-like phylogeny with relatively low genetic diversity at any point in time and rapid lineage turnover.[35] However, the phylogeny of influenza B's hemagglutinin protein has two circulating lineages: the Yamagata and the Victoria lineage.[53] It is unclear how the population dynamics of influenza B contribute to this evolutionary pattern, although one simulation model has been able to reproduce this phylogenetic pattern with longer infectious periods of the host.[54]

Genetic and antigenic variation of influenza is also present across a diverse set of host species. The impact of host population structure can be seen in the evolution of equine influenza A/H3N8: instead of a single trunk with short side-branches, the hemagglutinin of influenza A/H3N8 splits into two geographically distinct lineages, representing American and European viruses.[55][56] The evolution of these two lineages is thought to have occurred as a consequence of quarantine measures.[55] Additionally, host immune responses are hypothesized to modulate virus evolutionary dynamics. Swine influenza A/H3N2 is known to evolve antigenically at a rate that is six times slower than that of the same virus circulating in humans, although these viruses' rates of genetic evolution are similar.[57] Influenza in aquatic birds is hypothesized to exhibit 'evolutionary stasis',[58] although recent phylogenetic work indicates that the rate of evolutionary change in these hosts is similar to those in other hosts, including humans.[59] In these cases, it is thought that short host lifespans prevent the build-up of host immunity necessary to effectively drive antigenic drift.

Phylodynamics of HIV

The global diversity of HIV-1 group M is shaped by its origins in Central Africa around the turn of the 20th century. The epidemic underwent explosive growth throughout the early 20th century with multiple radiations out of Central Africa. While traditional epidemiological surveillance data are almost nonexistent for the early period of epidemic expansion, phylodynamic analyses based on modern sequence data can be used to estimate when the epidemic began and to estimate the early growth rate. The rapid early growth of HIV-1 in Central Africa is reflected in the star-like phylogenies of the virus, with most coalescent events occurring in the distant past. Multiple founder events have given rise to distinct HIV-1 group M subtypes which predominate in different parts of the world. Subtype B is most prevalent in North America and Western Europe, while subtypes A and C, which account for more than half of infections worldwide, are common in Africa.[60] HIV subtypes differ slightly in their transmissibility, virulence, effectiveness of antiretroviral therapy, and pathogenesis.[61]

The rate of exponential growth of HIV in Central Africa in the early 20th century preceding the establishment of modern subtypes has been estimated using coalescent approaches. Several estimates based on parametric exponential growth models are shown in table 1, for different time periods, risk groups and subtypes. The early spread of HIV-1 has also been characterized using nonparametric ("skyline") estimates of ${\displaystyle N_{e}}$.[62]

Estimated annual growth rates of ${\displaystyle N_{e}}$ for early HIV sub-epidemics.
Growth rate Group Subtype Risk group
0.17[63] M NA Central Africa
0.27[64] M C Central Africa
0.48[65]-0.83[25] M B North America/Eur/Aust, MSM
0.068[66] O NA Cameroon

The early growth of subtype B in North America was quite high, however, the duration of exponential growth was relatively short, with saturation occurring in the mid- and late-1980s.[2] At the opposite extreme, HIV-1 group O, a relatively rare group that is geographically confined to Cameroon and that is mainly spread by heterosexual sex, has grown at a lower rate than either subtype B or C.

HIV-1 sequences sampled over a span of five decades have been used with relaxed molecular clock phylogenetic methods to estimate the time of cross-species viral spillover into humans around the early 20th century.[67] The estimated TMRCA for HIV-1 coincides with the appearance of the first densely populated large cities in Central Africa. Similar methods have been used to estimate the time that HIV originated in different parts of the world. The origin of subtype B in North America is estimated to be in the 1960s, where it went undetected until the AIDS epidemic in the 1980s.[25] There is evidence that progenitors of modern subtype B originally colonized the Caribbean before undergoing multiple radiations to North and South America.[68] Subtype C originated around the same time in Africa.[65]

Contemporary epidemiological dynamics

At shorter time scales and finer geographical scales, HIV phylogenies may reflect epidemiological dynamics related to risk behavior and sexual networks. Very dense sampling of viral sequences within cities over short periods of time has given a detailed picture of HIV transmission patterns in modern epidemics. Sequencing of virus from newly diagnosed patients is now routine in many countries for surveillance of drug resistance mutations, which has yielded large databases of sequence data in those areas. There is evidence that HIV transmission within heterogeneous sexual networks leaves a trace in HIV phylogenies, in particular making phylogenies more imbalanced and concentrating coalescent events on a minority of lineages.[69]

By analyzing phylogenies estimated from HIV sequences from men who have sex with men in London, United Kingdom, Lewis et al. found evidence that transmission is highly concentrated in the brief period of primary HIV infection (PHI), which consists of approximately the first 6 months of the infectious period.[70] In a separate analysis, Volz et al.[71] found that simple epidemiological dynamics explain phylogenetic clustering of viruses collected from patients with PHI. Patients who were recently infected were more likely to harbor virus that is phylogenetically close to samples from other recently infected patients. Such clustering is consistent with observations in simulated epidemiological dynamics featuring an early period of intensified transmission during PHI. These results therefore provided further support for Lewis et al.'s findings that HIV transmission occurs frequently from individuals early in their infection.

Purifying immune selection dominates evolution of HIV within hosts, but evolution between hosts is largely decoupled from within-host evolution.[72] Immune selection has relatively little influence on HIV phylogenies at the population level for three reasons. First, there is an extreme bottleneck in viral diversity at the time of sexual transmission.[73] Second, transmission tends to occur early in infection before immune selection has had a chance to operate.[74] Finally, the replicative fitness of a viral strain (measured in transmissions per host) is largely extrinsic to virological factors, depending more heavily on behaviors in the host population. These include heterogeneous sexual and drug-use behaviors.

Between-host and within-host HIV phylogenies. Sequences were downloaded from the LANL HIV sequence database. Neighbor-joining trees were estimated from Alignment1, and the within host tree is based on data from patient 2. Trees were re-rooted using Path-o-gen using known sample dates.

There is some evidence from comparative phylogenetic analysis and epidemic simulations that HIV adapts at the level of the population to maximize transmission potential between hosts.[75] This adaptation is towards intermediate virulence levels, which balances the productive lifetime of the host (time until AIDS) with the transmission probability per act. A useful proxy for virulence is the set-point viral load (SPVL), which is correlated with the time until AIDS.[76] SPVL is the quasi-equilibrium titer of viral particles in the blood during chronic infection. For adaptation towards intermediate virulence to be possible, SPVL needs to be heritable and a trade-off between viral transmissibility and the lifespan of the host needs to exist. SPVL has been shown to be correlated between HIV donor and recipients in transmission pairs,[77] thereby providing evidence that SPVL is at least partly heritable. The transmission probability of HIV per sexual act is positively correlated with viral load,[78][79] thereby providing evidence of the trade-off between transmissibility and virulence. It is therefore theoretically possible that HIV evolves to maximize its transmission potential. Epidemiological simulation and comparative phylogenetic studies have shown that adaptation of HIV towards optimum SPVL could be expected over 100–150 years.[80] These results depend on empirical estimates for the transmissibility of HIV and the lifespan of hosts as a function of SPVL.

Future directions

Up to this point, phylodynamic approaches have focused almost entirely on RNA viruses, which often have mutation rates on the order of 10−3 to 10−4 substitutions per site per year.[81] This allows a sample of around 1000 bases to have power to give a fair degree of confidence in estimating the underlying genealogy connecting sampled viruses. However, other pathogens may have significantly slower rates of evolution. DNA viruses, such as herpes simplex virus, evolve orders of magnitude more slowly.[82] These viruses have commensurately larger genomes. Bacterial pathogens such as pneumococcus and tuberculosis evolve slower still and have even larger genomes. In fact, there exists a very general negative correlation between genome size and mutation rate across observed systems.[83] Because of this, similar amounts of phylogenetic signal are likely to result from sequencing full genomes of RNA viruses, DNA viruses or bacteria. As sequencing technologies continue to improve, it is becoming increasingly feasible to conduct phylodynamic analyses on the full diversity of pathogenic organisms.

Additionally, improvements in sequencing technologies will allow detailed investigation of within-host evolution, as the full diversity of an infecting quasispecies may be uncovered given enough sequencing effort.

References

1. Grenfell BT, Pybus OG, Gog JR, Wood JL, Daly JM, Mumford JA, Holmes EC (January 2004). "Unifying the epidemiological and evolutionary dynamics of pathogens". Science. 303 (5656): 327–32. doi:10.1126/science.1090727. PMID 14726583.
2. Volz EM, Kosakovsky Pond SL, Ward MJ, Leigh Brown AJ, Frost SD (December 2009). "Phylodynamics of infectious disease epidemics". Genetics. 183 (4): 1421–30. doi:10.1534/genetics.109.106021. PMC 2787429. PMID 19797047.
3. Bedford T, Cobey S, Beerli P, Pascual M (May 2010). Ferguson NM (ed.). "Global migration dynamics underlie evolution and persistence of human influenza A (H3N2)". PLoS Pathogens. 6 (5): e1000918. doi:10.1371/journal.ppat.1000918. PMC 2877742. PMID 20523898.
4. ^ a b Gray RR, Salemi M, Klenerman P, Pybus OG (2012). Rall GF (ed.). "A new evolutionary model for hepatitis C virus chronic infection". PLoS Pathogens. 8 (5): e1002656. doi:10.1371/journal.ppat.1002656. PMC 3342959. PMID 22570609.
5. ^ a b c Koelle K, Cobey S, Grenfell B, Pascual M (December 2006). "Epochal evolution shapes the phylodynamics of interpandemic influenza A (H3N2) in humans". Science. 314 (5807): 1898–903. doi:10.1126/science.1132745. PMID 17185596.
6. ^ Kouyos RD, von Wyl V, Yerly S, Böni J, Taffé P, Shah C, Bürgisser P, Klimkait T, Weber R, Hirschel B, Cavassini M, Furrer H, Battegay M, Vernazza PL, Bernasconi E, Rickenbach M, Ledergerber B, Bonhoeffer S, Günthard HF (May 2010). "Molecular epidemiology reveals long-term changes in HIV type 1 subtype B transmission in Switzerland". The Journal of Infectious Diseases. 201 (10): 1488–97. doi:10.1086/651951. PMID 20384495.
7. ^ a b Streicker DG, Turmelle AS, Vonhof MJ, Kuzmin IV, McCracken GF, Rupprecht CE (August 2010). "Host phylogeny constrains cross-species emergence and establishment of rabies virus in bats". Science. 329 (5992): 676–9. doi:10.1126/science.1188836. PMID 20689015.
8. ^ a b Fraser C, Donnelly CA, Cauchemez S, Hanage WP, Van Kerkhove MD, Hollingsworth TD, Griffin J, Baggaley RF, Jenkins HE, Lyons EJ, Jombart T, Hinsley WR, Grassly NC, Balloux F, Ghani AC, Ferguson NM, Rambaut A, Pybus OG, Lopez-Gatell H, Alpuche-Aranda CM, Chapela IB, Zavala EP, Guevara DM, Checchi F, Garcia E, Hugonnet S, Roth C (June 2009). "Pandemic potential of a strain of influenza A (H1N1): early findings". Science. 324 (5934): 1557–61. doi:10.1126/science.1176062. PMC 3735127. PMID 19433588.
9. ^ Lemey P, Rambaut A, Pybus OG (2006). "HIV evolutionary dynamics within and among hosts". AIDS Reviews. 8 (3): 125–40. PMID 17078483.
10. ^ Pybus OG, Charleston MA, Gupta S, Rambaut A, Holmes EC, Harvey PH (June 2001). "The epidemic behavior of the hepatitis C virus". Science. 292 (5525): 2323–5. CiteSeerX 10.1.1.725.9444. doi:10.1126/science.1058321. PMID 11423661.
11. ^ Volz EM (January 2012). "Complex population dynamics and the coalescent under neutrality". Genetics. 190 (1): 187–201. doi:10.1534/genetics.111.134627. PMC 3249372. PMID 22042576.
12. ^ Biek R, Henderson JC, Waller LA, Rupprecht CE, Real LA (May 2007). "A high-resolution genetic signature of demographic and spatial expansion in epizootic rabies virus". Proceedings of the National Academy of Sciences of the United States of America. 104 (19): 7993–8. doi:10.1073/pnas.0700741104. PMC 1876560. PMID 17470818.
13. ^ Lemey P, Rambaut A, Welch JJ, Suchard MA (August 2010). "Phylogeography takes a relaxed random walk in continuous space and time". Molecular Biology and Evolution. 27 (8): 1877–85. doi:10.1093/molbev/msq067. PMC 2915639. PMID 20203288.
14. ^ Stack JC, Welch JD, Ferrari MJ, Shapiro BU, Grenfell BT (July 2010). "Protocols for sampling viral sequences to study epidemic dynamics". Journal of the Royal Society, Interface. 7 (48): 1119–27. doi:10.1098/rsif.2009.0530. PMC 2880085. PMID 20147314.
15. ^ van Ballegooijen WM, van Houdt R, Bruisten SM, Boot HJ, Coutinho RA, Wallinga J (December 2009). "Molecular sequence data of hepatitis B virus and genetic diversity after vaccination". American Journal of Epidemiology. 170 (12): 1455–63. doi:10.1093/aje/kwp375. PMID 19910379.
16. ^ Halloran ME, Holmes EC (December 2009). "Invited commentary: Evaluating vaccination programs using genetic sequence data". American Journal of Epidemiology. 170 (12): 1464–6, discussion 1467–8. doi:10.1093/aje/kwp366. PMC 2800275. PMID 19910381.
17. ^ Drummond A, Forsberg R, Rodrigo AG (July 2001). "The inference of stepwise changes in substitution rates using serial sequence samples". Molecular Biology and Evolution. 18 (7): 1365–71. doi:10.1093/oxfordjournals.molbev.a003920. PMID 11420374.
18. ^ Lemey P, Kosakovsky Pond SL, Drummond AJ, Pybus OG, Shapiro B, Barroso H, Taveira N, Rambaut A (February 2007). "Synonymous substitution rates predict HIV disease progression as a result of underlying replication dynamics". PLoS Computational Biology. 3 (2): e29. doi:10.1371/journal.pcbi.0030029. PMC 1797821. PMID 17305421.
19. ^ Bloom JD, Gong LI, Baltimore D (June 2010). "Permissive secondary mutations enable the evolution of influenza oseltamivir resistance". Science. 328 (5983): 1272–5. doi:10.1126/science.1187816. PMC 2913718. PMID 20522774.
20. ^ Chao DL, Bloom JD, Kochin BF, Antia R, Longini IM (April 2012). "The global spread of drug-resistant influenza". Journal of the Royal Society, Interface. 9 (69): 648–56. doi:10.1098/rsif.2011.0427. PMC 3284134. PMID 21865253.
21. ^ Drummond AJ, Nicholls GK, Rodrigo AG, Solomon W (July 2002). "Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data". Genetics. 161 (3): 1307–20. PMC 1462188. PMID 12136032.
22. ^ Drummond AJ, Rambaut A, Shapiro B, Pybus OG (May 2005). "Bayesian coalescent inference of past population dynamics from molecular sequences". Molecular Biology and Evolution. 22 (5): 1185–92. doi:10.1093/molbev/msi103. PMID 15703244.
23. ^ Kühnert D, Wu CH, Drummond AJ (December 2011). "Phylogenetic and epidemic modeling of rapidly evolving infectious diseases". Infection, Genetics and Evolution. 11 (8): 1825–41. doi:10.1016/j.meegid.2011.08.005. PMID 21906695.
24. ^ Stadler T (December 2010). "Sampling-through-time in birth-death trees". Journal of Theoretical Biology. 267 (3): 396–404. doi:10.1016/j.jtbi.2010.09.010. PMID 20851708.
25. ^ a b c Robbins KE, Lemey P, Pybus OG, Jaffe HW, Youngpairoj AS, Brown TM, Salemi M, Vandamme AM, Kalish ML (June 2003). "U.S. Human immunodeficiency virus type 1 epidemic: date of origin, population history, and characterization of early strains". Journal of Virology. 77 (11): 6359–66. doi:10.1128/JVI.77.11.6359-6366.2003. PMC 155028. PMID 12743293.
26. ^ Donnelly P, Tavaré S (1995). "Coalescents and genealogical structure under neutrality". Annual Review of Genetics. 29: 401–21. doi:10.1146/annurev.ge.29.120195.002153. PMID 8825481.
27. ^ R M Anderson, R M May (1992) Infectious Diseases of Humans: Dynamics and Control. Oxford: Oxford University Press. 768 p.
28. ^ a b Frost SD, Volz EM (June 2010). "Viral phylodynamics and the search for an 'effective number of infections'". Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences. 365 (1548): 1879–90. doi:10.1098/rstb.2010.0060. PMC 2880113. PMID 20478883.
29. '^ J Wakeley (2008) Coalescent Theory: an Introduction. USA: Roberts & Company
30. ^ Lloyd-Smith JO, Schreiber SJ, Kopp PE, Getz WM (November 2005). "Superspreading and the effect of individual variation on disease emergence". Nature. 438 (7066): 355–9. doi:10.1038/nature04153. PMID 16292310.
31. ^ a b Koelle K, Rasmussen DA (May 2012). "Rates of coalescence for common epidemiological models at equilibrium". Journal of the Royal Society, Interface. 9 (70): 997–1007. doi:10.1098/rsif.2011.0495. PMC 3306638. PMID 21920961.
32. ^ a b Chen R, Holmes EC (January 2009). "Frequent inter-species transmission and geographic subdivision in avian influenza viruses from wild birds". Virology. 383 (1): 156–61. doi:10.1016/j.virol.2008.10.015. PMC 2633721. PMID 19000628.
33. ^ a b Lemey P, Rambaut A, Drummond AJ, Suchard MA (September 2009). Fraser C (ed.). "Bayesian phylogeography finds its roots". PLoS Computational Biology. 5 (9): e1000520. doi:10.1371/journal.pcbi.1000520. PMC 2740835. PMID 19779555.
34. ^ a b Gog JR, Grenfell BT (December 2002). "Dynamics and selection of many-strain pathogens". Proceedings of the National Academy of Sciences of the United States of America. 99 (26): 17209–14. doi:10.1073/pnas.252512799. PMC 139294. PMID 12481034.
35. ^ a b c d Ferguson NM, Galvani AP, Bush RM (March 2003). "Ecological and immunological determinants of influenza evolution". Nature. 422 (6930): 428–33. doi:10.1038/nature01509. PMID 12660783.
36. ^ Park AW, Daly JM, Lewis NS, Smith DJ, Wood JL, Grenfell BT (October 2009). "Quantifying the impact of immune escape on transmission dynamics of influenza". Science. 326 (5953): 726–8. doi:10.1126/science.1175980. PMC 3800096. PMID 19900931.
37. ^ Sisson SA, Fan Y, Tanaka MM (February 2007). "Sequential Monte Carlo without likelihoods". Proceedings of the National Academy of Sciences of the United States of America. 104 (6): 1760–5. doi:10.1073/pnas.0607208104. PMC 1794282. PMID 17264216.
38. ^ Luciani F, Sisson SA, Jiang H, Francis AR, Tanaka MM (August 2009). "The epidemiological fitness cost of drug resistance in Mycobacterium tuberculosis". Proceedings of the National Academy of Sciences of the United States of America. 106 (34): 14711–5. doi:10.1073/pnas.0902437106. PMC 2732896. PMID 19706556.
39. ^ Aeschbacher S, Beaumont MA, Futschik A (November 2012). "A novel approach for choosing summary statistics in approximate Bayesian computation". Genetics. 192 (3): 1027–47. doi:10.1534/genetics.112.143164. PMC 3522150. PMID 22960215.
40. ^ Fitch WM, Bush RM, Bender CA, Cox NJ (July 1997). "Long term trends in the evolution of H(3) HA1 human influenza type A". Proceedings of the National Academy of Sciences of the United States of America. 94 (15): 7712–8. doi:10.1073/pnas.94.15.7712. PMC 33681. PMID 9223253.
41. ^ a b Bush RM, Fitch WM, Bender CA, Cox NJ (November 1999). "Positive selection on the H3 hemagglutinin gene of human influenza virus A". Molecular Biology and Evolution. 16 (11): 1457–65. doi:10.1093/oxfordjournals.molbev.a026057. PMID 10555276.
42. ^ a b c Wolf YI, Viboud C, Holmes EC, Koonin EV, Lipman DJ (October 2006). "Long intervals of stasis punctuated by bursts of positive selection in the seasonal evolution of influenza A virus". Biology Direct. 1: 34. doi:10.1186/1745-6150-1-34. PMC 1647279. PMID 17067369.
43. ^ a b Bhatt S, Holmes EC, Pybus OG (September 2011). "The genomic rate of molecular adaptation of the human influenza A virus". Molecular Biology and Evolution. 28 (9): 2443–51. doi:10.1093/molbev/msr044. PMC 3163432. PMID 21415025.
44. ^ Bedford T, Cobey S, Pascual M (July 2011). "Strength and tempo of selection revealed in viral gene genealogies". BMC Evolutionary Biology. 11: 220. doi:10.1186/1471-2148-11-220. PMC 3199772. PMID 21787390.
45. ^ a b c Rambaut A, Pybus OG, Nelson MI, Viboud C, Taubenberger JK, Holmes EC (May 2008). "The genomic and epidemiological dynamics of human influenza A virus". Nature. 453 (7195): 615–9. doi:10.1038/nature06945. PMC 2441973. PMID 18418375.
46. ^ Squires RB, Noronha J, Hunt V, García-Sastre A, Macken C, Baumgarth N, Suarez D, Pickett BE, Zhang Y, Larsen CN, Ramsey A, Zhou L, Zaremba S, Kumar S, Deitrich J, Klem E, Scheuermann RH (November 2012). "Influenza research database: an integrated bioinformatics resource for influenza research and surveillance". Influenza and Other Respiratory Viruses. 6 (6): 404–16. doi:10.1111/j.1750-2659.2011.00331.x. PMC 3345175. PMID 22260278.
47. ^ Finkelman BS, Viboud C, Koelle K, Ferrari MJ, Bharti N, Grenfell BT (December 2007). Myer L (ed.). "Global patterns in seasonal activity of influenza A/H3N2, A/H1N1, and B from 1997 to 2005: viral coexistence and latitudinal gradients". PLOS ONE. 2 (12): e1296. doi:10.1371/journal.pone.0001296. PMC 2117904. PMID 18074020.
48. ^ a b Russell CA, Jones TC, Barr IG, Cox NJ, Garten RJ, Gregory V, Gust ID, Hampson AW, Hay AJ, Hurt AC, de Jong JC, Kelso A, Klimov AI, Kageyama T, Komadina N, Lapedes AS, Lin YP, Mosterin A, Obuchi M, Odagiri T, Osterhaus AD, Rimmelzwaan GF, Shaw MW, Skepner E, Stohr K, Tashiro M, Fouchier RA, Smith DJ (April 2008). "The global circulation of seasonal influenza A (H3N2) viruses". Science. 320 (5874): 340–6. doi:10.1126/science.1154137. PMID 18420927.
49. ^ a b Bahl J, Nelson MI, Chan KH, Chen R, Vijaykrishna D, Halpin RA, Stockwell TB, Lin X, Wentworth DE, Ghedin E, Guan Y, Peiris JS, Riley S, Rambaut A, Holmes EC, Smith GJ (November 2011). "Temporally structured metapopulation dynamics and persistence of influenza A H3N2 virus in humans". Proceedings of the National Academy of Sciences of the United States of America. 108 (48): 19359–64. doi:10.1073/pnas.1109314108. PMC 3228450. PMID 22084096.
50. ^ Smith DJ, Lapedes AS, de Jong JC, Bestebroer TM, Rimmelzwaan GF, Osterhaus AD, Fouchier RA (July 2004). "Mapping the antigenic and genetic evolution of influenza virus". Science. 305 (5682): 371–6. doi:10.1126/science.1097211. PMID 15218094.
51. ^ Gökaydin D, Oliveira-Martins JB, Gordo I, Gomes MG (February 2007). "The reinfection threshold regulates pathogen diversity: the case of influenza". Journal of the Royal Society, Interface. 4 (12): 137–42. doi:10.1098/rsif.2006.0159. PMC 2358964. PMID 17015285.
52. ^ Bedford T, Rambaut A, Pascual M (April 2012). "Canalization of the evolutionary trajectory of the human influenza virus". BMC Biology. 10: 38. doi:10.1186/1741-7007-10-38. PMC 3373370. PMID 22546494.
53. ^ Rota PA, Hemphill ML, Whistler T, Regnery HL, Kendal AP (October 1992). "Antigenic and genetic characterization of the haemagglutinins of recent cocirculating strains of influenza B virus". The Journal of General Virology. 73 ( Pt 10) (10): 2737–42. doi:10.1099/0022-1317-73-10-2737. PMID 1402807.
54. ^ Koelle K, Khatri P, Kamradt M, Kepler TB (September 2010). "A two-tiered model for simulating the ecological and evolutionary dynamics of rapidly evolving viruses, with an application to influenza". Journal of the Royal Society, Interface. 7 (50): 1257–74. doi:10.1098/rsif.2010.0007. PMC 2894885. PMID 20335193.
55. ^ a b Daly JM, Lai AC, Binns MM, Chambers TM, Barrandeguy M, Mumford JA (April 1996). "Antigenic and genetic evolution of equine H3N8 influenza A viruses". The Journal of General Virology. 77 ( Pt 4) (4): 661–71. doi:10.1099/0022-1317-77-4-661. PMID 8627254.
56. ^ Oxburgh L, Klingeborn B (September 1999). "Cocirculation of two distinct lineages of equine influenza virus subtype H3N8". Journal of Clinical Microbiology. 37 (9): 3005–9. PMC 85435. PMID 10449491.
57. ^ de Jong JC, Smith DJ, Lapedes AS, Donatelli I, Campitelli L, Barigazzi G, Van Reeth K, Jones TC, Rimmelzwaan GF, Osterhaus AD, Fouchier RA (April 2007). "Antigenic and genetic evolution of swine influenza A (H3N2) viruses in Europe". Journal of Virology. 81 (8): 4315–22. doi:10.1128/JVI.02458-06. PMC 1866135. PMID 17287258.
58. ^ Webster RG, Bean WJ, Gorman OT, Chambers TM, Kawaoka Y (March 1992). "Evolution and ecology of influenza A viruses". Microbiological Reviews. 56 (1): 152–79. PMC 372859. PMID 1579108.
59. ^ Chen R, Holmes EC (December 2006). "Avian influenza virus exhibits rapid evolutionary dynamics". Molecular Biology and Evolution. 23 (12): 2336–41. doi:10.1093/molbev/msl102. PMID 16945980.
60. ^ Osmanov S, Pattou C, Walker N, Schwardländer B, Esparza J (February 2002). "Estimated global distribution and regional spread of HIV-1 genetic subtypes in the year 2000". Journal of Acquired Immune Deficiency Syndromes. 29 (2): 184–90. doi:10.1097/00042560-200202010-00013. PMID 11832690.
61. ^ Taylor BS, Hammer SM (October 2008). "The challenge of HIV-1 subtype diversity". The New England Journal of Medicine. 359 (18): 1965–6. doi:10.1056/NEJMc086373. PMID 18971501.
62. ^ Strimmer K, Pybus OG (December 2001). "Exploring the demographic history of DNA sequences using the generalized skyline plot". Molecular Biology and Evolution. 18 (12): 2298–305. doi:10.1093/oxfordjournals.molbev.a003776. PMID 11719579.
63. ^ Yusim K, Peeters M, Pybus OG, Bhattacharya T, Delaporte E, Mulanga C, Muldoon M, Theiler J, Korber B (June 2001). "Using human immunodeficiency virus type 1 sequences to infer historical features of the acquired immune deficiency syndrome epidemic and human immunodeficiency virus evolution". Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences. 356 (1410): 855–66. doi:10.1098/rstb.2001.0859. PMC 1088479. PMID 11405933.
64. ^ Grassly NC, Harvey PH, Holmes EC (February 1999). "Population dynamics of HIV-1 inferred from gene sequences". Genetics. 151 (2): 427–38. PMC 1460489. PMID 9927440.
65. ^ a b Walker PR, Pybus OG, Rambaut A, Holmes EC (April 2005). "Comparative population dynamics of HIV-1 subtypes B and C: subtype-specific differences in patterns of epidemic growth". Infection, Genetics and Evolution. 5 (3): 199–208. doi:10.1016/j.meegid.2004.06.011. PMID 15737910.
66. ^ Lemey P, Pybus OG, Rambaut A, Drummond AJ, Robertson DL, Roques P, Worobey M, Vandamme AM (July 2004). "The molecular population genetics of HIV-1 group O". Genetics. 167 (3): 1059–68. doi:10.1534/genetics.104.026666. PMC 1470933. PMID 15280223.
67. ^ Worobey M, Gemmel M, Teuwen DE, Haselkorn T, Kunstman K, Bunce M, Muyembe JJ, Kabongo JM, Kalengayi RM, Van Marck E, Gilbert MT, Wolinsky SM (October 2008). "Direct evidence of extensive diversity of HIV-1 in Kinshasa by 1960". Nature. 455 (7213): 661–4. doi:10.1038/nature07390. PMC 3682493. PMID 18833279.
68. ^ Junqueira DM, de Medeiros RM, Matte MC, Araújo LA, Chies JA, Ashton-Prolla P, Almeida SE (2011). Martin DP (ed.). "Reviewing the history of HIV-1: spread of subtype B in the Americas". PLOS ONE. 6 (11): e27489. doi:10.1371/journal.pone.0027489. PMC 3223166. PMID 22132104.
69. ^ Leventhal GE, Kouyos R, Stadler T, von Wyl V, Yerly S, Böni J, Cellerai C, Klimkait T, Günthard HF, Bonhoeffer S (2012). Tanaka MM (ed.). "Inferring epidemic contact structure from phylogenetic trees". PLoS Computational Biology. 8 (3): e1002413. doi:10.1371/journal.pcbi.1002413. PMC 3297558. PMID 22412361.
70. ^ Lewis F, Hughes GJ, Rambaut A, Pozniak A, Leigh Brown AJ (March 2008). "Episodic sexual transmission of HIV revealed by molecular phylodynamics". PLoS Medicine. 5 (3): e50. doi:10.1371/journal.pmed.0050050. PMC 2267814. PMID 18351795.
71. ^ Volz EM, Koopman JS, Ward MJ, Brown AL, Frost SD (2012). Fraser C (ed.). "Simple epidemiological dynamics explain phylogenetic clustering of HIV from patients with recent infection". PLoS Computational Biology. 8 (6): e1002552. doi:10.1371/journal.pcbi.1002552. PMC 3386305. PMID 22761556.
72. ^ Rambaut A, Posada D, Crandall KA, Holmes EC (January 2004). "The causes and consequences of HIV evolution". Nature Reviews. Genetics. 5 (1): 52–61. doi:10.1038/nrg1246. PMID 14708016.
73. ^ Keele BF (July 2010). "Identifying and characterizing recently transmitted viruses". Current Opinion in HIV and AIDS. 5 (4): 327–34. doi:10.1097/COH.0b013e32833a0b9b. PMC 2914479. PMID 20543609.
74. ^ Cohen MS, Shaw GM, McMichael AJ, Haynes BF (May 2011). "Acute HIV-1 Infection". The New England Journal of Medicine. 364 (20): 1943–54. doi:10.1056/NEJMra1011874. PMC 3771113. PMID 21591946.
75. ^ Fraser C, Hollingsworth TD, Chapman R, de Wolf F, Hanage WP (October 2007). "Variation in HIV-1 set-point viral load: epidemiological analysis and an evolutionary hypothesis". Proceedings of the National Academy of Sciences of the United States of America. 104 (44): 17441–6. doi:10.1073/pnas.0708559104. PMC 2077275. PMID 17954909.
76. ^ Korenromp EL, Williams BG, Schmid GP, Dye C (June 2009). Pai NP (ed.). "Clinical prognostic value of RNA viral load and CD4 cell counts during untreated HIV-1 infection--a quantitative review". PLOS ONE. 4 (6): e5950. doi:10.1371/journal.pone.0005950. PMC 2694276. PMID 19536329.
77. ^ Hollingsworth TD, Laeyendecker O, Shirreff G, Donnelly CA, Serwadda D, Wawer MJ, Kiwanuka N, Nalugoda F, Collinson-Streng A, Ssempijja V, Hanage WP, Quinn TC, Gray RH, Fraser C (May 2010). Holmes EC (ed.). "HIV-1 transmitting couples have similar viral load set-points in Rakai, Uganda". PLoS Pathogens. 6 (5): e1000876. doi:10.1371/journal.ppat.1000876. PMC 2865511. PMID 20463808.
78. ^ Baeten JM, Overbaugh J (January 2003). "Measuring the infectiousness of persons with HIV-1: opportunities for preventing sexual HIV-1 transmission". Current HIV Research. 1 (1): 69–86. doi:10.2174/1570162033352110. PMID 15043213.
79. ^ Fiore JR, Zhang YJ, Björndal A, Di Stefano M, Angarano G, Pastore G, Fenyö EM (July 1997). "Biological correlates of HIV-1 heterosexual transmission". AIDS. 11 (9): 1089–94. doi:10.1097/00002030-199709000-00002. PMID 9233454.
80. ^ Shirreff G, Pellis L, Laeyendecker O, Fraser C (October 2011). Müller V (ed.). "Transmission selects for HIV-1 strains of intermediate virulence: a modelling approach". PLoS Computational Biology. 7 (10): e1002185. doi:10.1371/journal.pcbi.1002185. PMC 3192807. PMID 22022243.
81. ^ Drake JW (May 1993). "Rates of spontaneous mutation among RNA viruses". Proceedings of the National Academy of Sciences of the United States of America. 90 (9): 4171–5. doi:10.1073/pnas.90.9.4171. PMC 46468. PMID 8387212.
82. ^ Sakaoka H, Kurita K, Iida Y, Takada S, Umene K, Kim YT, Ren CS, Nahmias AJ (March 1994). "Quantitative analysis of genomic polymorphism of herpes simplex virus type 1 strains from six countries: studies of molecular evolution and molecular epidemiology of the virus". The Journal of General Virology. 75 ( Pt 3) (3): 513–27. doi:10.1099/0022-1317-75-3-513. PMID 8126449.
83. ^ Drake JW (August 1991). "A constant rate of spontaneous mutation in DNA-based microbes". Proceedings of the National Academy of Sciences of the United States of America. 88 (16): 7160–4. doi:10.1073/pnas.88.16.7160. PMC 52253. PMID 1831267.