Bacterial phylodynamics

Bacterial phylodynamics is the study of immunology, epidemiology, and phylogenetics of bacterial pathogens to better understand the evolutionary role of these pathogens.^[1]^[2]^[3] Phylodynamic analysis includes analyzing genetic diversity, natural selection, and population dynamics of infectious disease pathogen phylogenies during pandemics and studying intra-host evolution of viruses.^[4] Phylodynamics combines the study of phylogenetic analysis, ecological, and evolutionary processes to better understand of the mechanisms that drive spatiotemporal incidence and phylogenetic patterns of bacterial pathogens.^[2]^[4] Bacterial phylodynamics uses genome-wide single-nucleotide polymorphisms (SNP) in order to better understand the evolutionary mechanism of bacterial pathogens.^[5] Many phylodynamic studies have been performed on viruses, specifically RNA viruses (see Viral phylodynamics) which have high mutation rates. The field of bacterial phylodynamics has increased substantially due to the advancement of next-generation sequencing and the amount of data available.

Methods

Novel hypothesis (study design)

Studies can be designed to observe intra-host or inter-host interactions. Bacterial phylodynamic studies usually focus on inter-host interactions with samples from many different hosts in a specific geographical location or several different geographical locations.^[4] The most important part of a study design is how to organize the sampling strategy.^[4] For example, the number of sampled time points, the sampling interval, and the number of sequences per time point are crucial to phylodynamic analysis.^[4] Sampling bias causes problems when looking at a diverse taxological samples.^[3] For example, sampling from a limited geographical location may impact effective population size.^[6]

Generating data

Experimental settings

Sequencing of the genome or genomic regions and what sequencing technique to use is an important experimental setting to phylodynamic analysis. Whole genome sequencing is often performed on bacterial genomes, although depending on the design of the study, many different methods can be utilized for phylodynamic analysis. Bacterial genomes are much larger and have a slower evolutionary rate than RNA viruses, limiting studies on the bacterial phylodynamics. The advancement of sequencing technology has made bacterial phylodynamics possible but proper preparation of the whole bacterial genomes is mandatory.

Alignment

When a new dataset with samples for phylodynamic analysis are obtained, the sequences in the new data set are aligned.^[4] A BLAST search is frequently executed to find similar strains of the pathogen of interest. Sequences collected from BLAST for an alignment will need the proper information to be added to a data set, such as sample collection date and geographical location of the sample. Multiple sequence alignment algorithms (e.g., MUSCLE,^[7] MAFFT,^[8] and CLUSAL W^[9]) will align the data set with all selected sequences. After the running a multiple sequence alignment algorithm, manual editing the alignment is highly recommended.^[4] Multiple sequence alignment algorithms can leave a large amount of indels in the sequence alignment when the indels do not exist.^[4] Manually editing the indels in the data set will allow a more accurate phylogenetic tree.^[4]

Quality control

In order to have an accurate phylodynamic analysis, quality control methods must be performed. This includes checking the samples in the data set for possible contamination, measuring phylogenetic signal of the sequences, and checking the sequences for possible signs of recombinant strains.^[4] Contamination of samples in the data set can be excluded with by various laboratory methods and by proper DNA/RNA extraction methods. There are several way to check for phylogenetic signal in an alignment, such as likelihood mapping, transition/transversions versus divergence plots, and the Xia test for saturation.^[4] If phylogenetic signal of an alignment is too low then a longer alignment or an alignment of another gene in the organism may be necessary to perform phylogenetic analysis.^[4] Typically substitution saturation is only in issue in data sets with viral sequences. Most algorithms used for phylogenetic analysis do not take into recombination into account, which can alter the molecular clock and coalescent estimates of a multiple sequence alignment.^[4] Strains that show signs of recombination should either be excluded from the data set or analyzed on their own.^[4]

Data analysis

Evolutionary model

The best fitting nucleotide or amino acid substitution model for a multiple sequence alignment is the first step in phylodynamic analysis. This can be accomplished with several different algorithms (e.g., IQTREE,^[10] MEGA^[11]).

Phylogeny inference

There are several different methods to infer phylogenies. These include methods include tree building algorithms such as UPGMA, neighbor joining, maximum parsimony, maximum likelihood, and Bayesian analysis.^[4]

Hypothesis testing

Assessing phylogenetic support

Testing the reliability of the tree after inferring its phylogeny, is a crucial step in the phylodynamic pipeline.^[4] Methods to test the reliability of a tree include bootstrapping, maximum likelihood estimation, and posterior probabilities in Bayesian analysis.^[4]

Phylodynamics inference

Several methods are used to assess phylodynamic reliability of a data set. These methods include estimating the data set's molecular clock, demographic history, population structure, gene flow, and selection analysis.^[4] Phylodynamic results of a data set can also influence better study designs in future experiments.

Examples

Phylodynamics of cholera

Cholera is a diarrheal disease that is caused by the bacterium Vibrio cholerae. V. cholerae has been a popular bacterium for phylodynamic analysis after the 2010 cholera outbreak in Haiti. The cholera outbreak happened right after the 2010 earthquake in Haiti, which caused critical infrastructure damage, leading to the conclusion that the outbreak was most likely due to the V. cholerae bacterium being introduced naturally to the waters in Haiti from the earthquake. Soon after the earthquake, the UN sent MINUSTAH troops from Nepal to Haiti. Rumors started circulating about terrible conditions of the MINUSTAH camp, as well as people claiming that the MINUSTAH troops were deposing of their waste in the Artibonite River, which is the major water source in the surrounding area. Soon after the MINUSTAH troops arrival, the first cholera case was reported near the location of the MINUSTAH camp.^[12] Phylodynamic analysis was used to look into the source of the Haiti cholera outbreak. Whole genome sequencing of V. cholerae revealed that there was one single point source of the cholera outbreak in Haiti and it was similar to O1 strains circulating in South Asia.^[12]^[13] Before the MINUSTAH troops from Nepal were sent to Haiti, a cholera outbreak had just occurred in Nepal. In the original research to trace the origin of the outbreak, the Nepal strains were not available.^[12] Phylodynamic analyses were performed on the Haitian strain and the Nepalese strain when it became available and affirmed that the Haitian cholera strain was the most similar to the Nepalese cholera strain.^[14] This outbreak strain of cholera in Haiti showed signs of an altered or hybrid strain of V. cholerae associated with high virulence.^[5] Typically high quality single-nucleotide polymorphisms (hqSNP) from whole genome V. cholerae sequences are used for phylodynamic analysis.^[5] Using phylodynamic analysis to study cholera helps prediction and understanding of V. cholerae evolution during bacterial epidemics.^[5]

References

^ Volz, Erik M.; Koelle, Katia; Bedford, Trevor (2013-03-21). "Viral Phylodynamics". PLOS Computational Biology. 9 (3): e1002947. Bibcode:2013PLSCB...9E2947V. doi:10.1371/journal.pcbi.1002947. ISSN 1553-7358. PMC 3605911. PMID 23555203.
^ ^a ^b Grenfell, Bryan T.; Pybus, Oliver G.; Gog, Julia R.; Wood, James L. N.; Daly, Janet M.; Mumford, Jenny A.; Holmes, Edward C. (2004-01-16). "Unifying the epidemiological and evolutionary dynamics of pathogens". Science. 303 (5656): 327–332. Bibcode:2004Sci...303..327G. doi:10.1126/science.1090727. ISSN 1095-9203. PMID 14726583. S2CID 4017704.
^ ^a ^b Frost, Simon D.W.; Pybus, Oliver G.; Gog, Julia R.; Viboud, Cecile; Bonhoeffer, Sebastian; Bedford, Trevor (2015). "Eight challenges in phylodynamic inference". Epidemics. 10: 88–92. doi:10.1016/j.epidem.2014.09.001. PMC 4383806. PMID 25843391.
^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k ^l ^m ⁿ ^o ^p ^q ^r Norström, Melissa M.; Karlsson, Annika C.; Salemi, Marco (2012-04-01). "Towards a new paradigm linking virus molecular evolution and pathogenesis: experimental design and phylodynamic inference". The New Microbiologica. 35 (2): 101–111. ISSN 1121-7138. PMID 22707126.
^ ^a ^b ^c ^d Azarian, Taj; Ali, Afsar; Johnson, Judith A.; Mohr, David; Prosperi, Mattia; Veras, Nazle M.; Jubair, Mohammed; Strickland, Samantha L.; Rashid, Mohammad H. (2014-12-31). "Phylodynamic Analysis of Clinical and Environmental Vibrio cholerae Isolates from Haiti Reveals Diversification Driven by Positive Selection". mBio. 5 (6): e01824–14. doi:10.1128/mBio.01824-14. ISSN 2150-7511. PMC 4278535. PMID 25538191.
^ Biek, Roman; Pybus, Oliver G.; Lloyd-Smith, James O.; Didelot, Xavier (2015). "Measurably evolving pathogens in the genomic era". Trends in Ecology & Evolution. 30 (6): 306–313. doi:10.1016/j.tree.2015.03.009. PMC 4457702. PMID 25887947.
^ Edgar, Robert C. (2004-01-01). "MUSCLE: multiple sequence alignment with high accuracy and high throughput". Nucleic Acids Research. 32 (5): 1792–1797. doi:10.1093/nar/gkh340. ISSN 1362-4962. PMC 390337. PMID 15034147.
^ Katoh, Kazutaka; Misawa, Kazuharu; Kuma, Kei-ichi; Miyata, Takashi (2002-07-15). "MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform". Nucleic Acids Research. 30 (14): 3059–3066. doi:10.1093/nar/gkf436. ISSN 0305-1048. PMC 135756. PMID 12136088.
^ Larkin, M. A.; Blackshields, G.; Brown, N. P.; Chenna, R.; McGettigan, P. A.; McWilliam, H.; Valentin, F.; Wallace, I. M.; Wilm, A. (2007-11-01). "Clustal W and Clustal X version 2.0". Bioinformatics. 23 (21): 2947–2948. doi:10.1093/bioinformatics/btm404. ISSN 1367-4811. PMID 17846036.
^ Nguyen, Lam-Tung; Schmidt, Heiko A.; von Haeseler, Arndt; Minh, Bui Quang (2015-01-01). "IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies". Molecular Biology and Evolution. 32 (1): 268–274. doi:10.1093/molbev/msu300. ISSN 0737-4038. PMC 4271533. PMID 25371430.
^ Kumar, Sudhir; Stecher, Glen; Tamura, Koichiro (2016-07-01). "MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger Datasets". Molecular Biology and Evolution. 33 (7): 1870–1874. doi:10.1093/molbev/msw054. ISSN 1537-1719. PMC 8210823. PMID 27004904.
^ ^a ^b ^c Piarroux, Renaud (2011). "Understanding the Cholera Epidemic, Haiti". Emerging Infectious Diseases. 17 (7): 1161–1168. doi:10.3201/eid1707.110059. PMC 3381400. PMID 21762567.
^ Orata, Fabini D.; Keim, Paul S.; Boucher, Yan (2014-04-03). "The 2010 Cholera Outbreak in Haiti: How Science Solved a Controversy". PLOS Pathogens. 10 (4): e1003967. doi:10.1371/journal.ppat.1003967. ISSN 1553-7374. PMC 3974815. PMID 24699938.
^ Katz, Lee S.; Petkau, Aaron; Beaulaurier, John; Tyler, Shaun; Antonova, Elena S.; Turnsek, Maryann A.; Guo, Yan; Wang, Susana; Paxinos, Ellen E. (2013-08-30). "Evolutionary Dynamics of Vibrio cholerae O1 following a Single-Source Introduction to Haiti". mBio. 4 (4): e00398–13. doi:10.1128/mBio.00398-13. ISSN 2150-7511. PMC 3705451. PMID 23820394.

[1] Volz, Erik M.; Koelle, Katia; Bedford, Trevor (2013-03-21). "Viral Phylodynamics". PLOS Computational Biology. 9 (3): e1002947. Bibcode:2013PLSCB...9E2947V. doi:10.1371/journal.pcbi.1002947. ISSN 1553-7358. PMC 3605911. PMID 23555203.

[:0-2] Grenfell, Bryan T.; Pybus, Oliver G.; Gog, Julia R.; Wood, James L. N.; Daly, Janet M.; Mumford, Jenny A.; Holmes, Edward C. (2004-01-16). "Unifying the epidemiological and evolutionary dynamics of pathogens". Science. 303 (5656): 327–332. Bibcode:2004Sci...303..327G. doi:10.1126/science.1090727. ISSN 1095-9203. PMID 14726583. S2CID 4017704.

[:3-3] Frost, Simon D.W.; Pybus, Oliver G.; Gog, Julia R.; Viboud, Cecile; Bonhoeffer, Sebastian; Bedford, Trevor (2015). "Eight challenges in phylodynamic inference". Epidemics. 10: 88–92. doi:10.1016/j.epidem.2014.09.001. PMC 4383806. PMID 25843391.

[:1-4] ^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k ^l ^m ⁿ ^o ^p ^q ^r Norström, Melissa M.; Karlsson, Annika C.; Salemi, Marco (2012-04-01). "Towards a new paradigm linking virus molecular evolution and pathogenesis: experimental design and phylodynamic inference". The New Microbiologica. 35 (2): 101–111. ISSN 1121-7138. PMID 22707126.

[:2-5] Azarian, Taj; Ali, Afsar; Johnson, Judith A.; Mohr, David; Prosperi, Mattia; Veras, Nazle M.; Jubair, Mohammed; Strickland, Samantha L.; Rashid, Mohammad H. (2014-12-31). "Phylodynamic Analysis of Clinical and Environmental Vibrio cholerae Isolates from Haiti Reveals Diversification Driven by Positive Selection". mBio. 5 (6): e01824–14. doi:10.1128/mBio.01824-14. ISSN 2150-7511. PMC 4278535. PMID 25538191.

[:4-6] Biek, Roman; Pybus, Oliver G.; Lloyd-Smith, James O.; Didelot, Xavier (2015). "Measurably evolving pathogens in the genomic era". Trends in Ecology & Evolution. 30 (6): 306–313. doi:10.1016/j.tree.2015.03.009. PMC 4457702. PMID 25887947.

[7] Edgar, Robert C. (2004-01-01). "MUSCLE: multiple sequence alignment with high accuracy and high throughput". Nucleic Acids Research. 32 (5): 1792–1797. doi:10.1093/nar/gkh340. ISSN 1362-4962. PMC 390337. PMID 15034147.

[8] Katoh, Kazutaka; Misawa, Kazuharu; Kuma, Kei-ichi; Miyata, Takashi (2002-07-15). "MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform". Nucleic Acids Research. 30 (14): 3059–3066. doi:10.1093/nar/gkf436. ISSN 0305-1048. PMC 135756. PMID 12136088.

[9] Larkin, M. A.; Blackshields, G.; Brown, N. P.; Chenna, R.; McGettigan, P. A.; McWilliam, H.; Valentin, F.; Wallace, I. M.; Wilm, A. (2007-11-01). "Clustal W and Clustal X version 2.0". Bioinformatics. 23 (21): 2947–2948. doi:10.1093/bioinformatics/btm404. ISSN 1367-4811. PMID 17846036.

[10] Nguyen, Lam-Tung; Schmidt, Heiko A.; von Haeseler, Arndt; Minh, Bui Quang (2015-01-01). "IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies". Molecular Biology and Evolution. 32 (1): 268–274. doi:10.1093/molbev/msu300. ISSN 0737-4038. PMC 4271533. PMID 25371430.

[11] Kumar, Sudhir; Stecher, Glen; Tamura, Koichiro (2016-07-01). "MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger Datasets". Molecular Biology and Evolution. 33 (7): 1870–1874. doi:10.1093/molbev/msw054. ISSN 1537-1719. PMC 8210823. PMID 27004904.

[:5-12] Piarroux, Renaud (2011). "Understanding the Cholera Epidemic, Haiti". Emerging Infectious Diseases. 17 (7): 1161–1168. doi:10.3201/eid1707.110059. PMC 3381400. PMID 21762567.

[13] Orata, Fabini D.; Keim, Paul S.; Boucher, Yan (2014-04-03). "The 2010 Cholera Outbreak in Haiti: How Science Solved a Controversy". PLOS Pathogens. 10 (4): e1003967. doi:10.1371/journal.ppat.1003967. ISSN 1553-7374. PMC 3974815. PMID 24699938.

[14] Katz, Lee S.; Petkau, Aaron; Beaulaurier, John; Tyler, Shaun; Antonova, Elena S.; Turnsek, Maryann A.; Guo, Yan; Wang, Susana; Paxinos, Ellen E. (2013-08-30). "Evolutionary Dynamics of Vibrio cholerae O1 following a Single-Source Introduction to Haiti". mBio. 4 (4): e00398–13. doi:10.1128/mBio.00398-13. ISSN 2150-7511. PMC 3705451. PMID 23820394.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]