Genomic infectious disease epidemiology in partially sampled and ongoing outbreaks
Open Access
- 18 January 2017
- journal article
- research article
- Published by Oxford University Press (OUP) in Molecular Biology and Evolution
- Vol. 34 (4), msw075-1007
- https://doi.org/10.1093/molbev/msw275
Abstract
Genomic data are increasingly being used to understand infectious disease epidemiology. Isolates from a given outbreak are sequenced, and the patterns of shared variation are used to infer which isolates within the outbreak are most closely related to each other. Unfortunately, the phylogenetic trees typically used to represent this variation are not directly informative about who infected whom—a phylogenetic tree is not a transmission tree. However, a transmission tree can be inferred from a phylogeny while accounting for within-host genetic diversity by coloring the branches of a phylogeny according to which host those branches were in. Here we extend this approach and show that it can be applied to partially sampled and ongoing outbreaks. This requires computing the correct probability of an observed transmission tree and we herein demonstrate how to do this for a large class of epidemiological models. We also demonstrate how the branch coloring approach can incorporate a variable number of unique colors to represent unsampled intermediates in transmission chains. The resulting algorithm is a reversible jump Monte–Carlo Markov Chain, which we apply to both simulated data and real data from an outbreak of tuberculosis. By accounting for unsampled cases and an outbreak which may not have reached its end, our method is uniquely suited to use in a public health environment during real-time outbreak investigations. We implemented this transmission tree inference methodology in an R package called TransPhylo, which is freely available from https://github.com/xavierdidelot/TransPhylo.Keywords
This publication has 47 references indexed in Scilit:
- Fast Dating Using Least-Squares Criteria and AlgorithmsSystematic Biology, 2015
- Genome sequencing defines phylogeny and spread of methicillin-resistant Staphylococcus aureus in a high transmission settingGenome Research, 2014
- A Bayesian approach for inferring the dynamics of partially observed endemic infectious diseases from space-time-genetic dataProceedings Of The Royal Society B-Biological Sciences, 2014
- Within-Host Bacterial Diversity Hinders Accurate Reconstruction of Transmission Networks from Genomic Distance DataPLoS Computational Biology, 2014
- Validation of high throughput sequencing and microbial forensics applicationsInvestigative Genetics, 2014
- Bayesian Reconstruction of Disease Outbreaks by Combining Epidemiologic and Genomic DataPLoS Computational Biology, 2013
- Relating Phylogenetic Trees to Transmission Trees of Infectious Disease OutbreaksGenetics, 2013
- A New Framework and Software to Estimate Time-Varying Reproduction Numbers During EpidemicsAmerican Journal of Epidemiology, 2013
- Stochastic Processes Are Key Determinants of Short-Term Evolution in Influenza A VirusPLoS Pathogens, 2006
- Molecular Epidemiology of Tuberculosis among Immigrants in Hamburg, GermanyJournal of Clinical Microbiology, 2004