Inadequacies of Minimum Spanning Trees in Molecular Epidemiology
- 1 October 2011
- journal article
- Published by American Society for Microbiology in Journal of Clinical Microbiology
- Vol. 49 (10), 3568-3575
- https://doi.org/10.1128/jcm.00919-11
Abstract
Minimum spanning trees (MSTs) are frequently used in molecular epidemiology research to estimate relationships among individual strains or isolates. Nevertheless, there are significant caveats to MST algorithms that have been largely ignored in molecular epidemiology studies and that have the potential to confound or alter the interpretation of the results of those analyses. Specifically, (i) presenting a single, arbitrarily selected MST illustrates only one of potentially many equally optimal solutions, and (ii) statistical metrics are not used to assess the credibility of MST estimations. Here, we survey published MSTs previously used to infer microbial population structure in order to determine the effect of these factors. We propose a technique to estimate the number of alternative MSTs for a data set and find that multiple MSTs exist for each case in our survey. By implementing a bootstrapping metric to evaluate the reliability of alternative MST solutions, we discover that they encompass a wide range of credibility values. On the basis of these observations, we conclude that current approaches to studying population structure using MSTs are inadequate. We instead propose a systematic approach to MST estimation that bases analyses on the optimal computation of an input distance matrix, provides information about the number and configurations of alternative MSTs, and allows identification of the most credible MST or MSTs by using a bootstrapping metric. It is our hope this algorithm will become the new “gold standard” approach for analyzing MSTs for molecular epidemiology so that this generally useful computational approach can be used informatively and to its full potential.Keywords
This publication has 22 references indexed in Scilit:
- Spoligotype-Based Comparative Population Structure Analysis of Multidrug-Resistant and Isoniazid-Monoresistant Mycobacterium tuberculosis Complex Clinical Isolates in PolandJournal of Clinical Microbiology, 2010
- Phylogeographical and molecular characterization of an emerging Mycobacterium tuberculosis clone in Trinidad and TobagoInfection, Genetics and Evolution, 2009
- Genetic epidemiology of the Sudden Oak Death pathogenPhytophthora ramorumin CaliforniaMolecular Ecology, 2009
- Frequent emergence and limited geographic dispersal of methicillin-resistant Staphylococcus aureusProceedings of the National Academy of Sciences of the United States of America, 2008
- Molecular epidemiology of pneumococci obtained from Gambian children aged 2–29 months with invasive pneumococcal disease during a trial of a 9-valent pneumococcal conjugate vaccineBMC Infectious Diseases, 2008
- Inference of population structure using multilocus genotype data: dominant markers and null allelesMolecular Ecology Notes, 2007
- Molecular Epidemiologic Investigation of Campylobacter coli in Swine Production Systems, Using Multilocus Sequence TypingApplied and Environmental Microbiology, 2006
- Mechanisms of tandem repeat instability in bacteriaMutation research. Reviews in mutation research, 2006
- Use of the Minimum Spanning Tree Model for Molecular Epidemiological Investigation of a Nosocomial Outbreak of Hepatitis C Virus InfectionJournal of Clinical Microbiology, 2004
- Usefulness of Multilocus Sequence Typing for Characterization of Clinical Isolates of Candida albicansJournal of Clinical Microbiology, 2002