Inadequacies of Minimum Spanning Trees in Molecular Epidemiology

1 October 2011

journal article
Published by American Society for Microbiology in Journal of Clinical Microbiology

Vol. 49 (10), 3568-3575
https://doi.org/10.1128/jcm.00919-11

Abstract

Minimum spanning trees (MSTs) are frequently used in molecular epidemiology research to estimate relationships among individual strains or isolates. Nevertheless, there are significant caveats to MST algorithms that have been largely ignored in molecular epidemiology studies and that have the potential to confound or alter the interpretation of the results of those analyses. Specifically, (i) presenting a single, arbitrarily selected MST illustrates only one of potentially many equally optimal solutions, and (ii) statistical metrics are not used to assess the credibility of MST estimations. Here, we survey published MSTs previously used to infer microbial population structure in order to determine the effect of these factors. We propose a technique to estimate the number of alternative MSTs for a data set and find that multiple MSTs exist for each case in our survey. By implementing a bootstrapping metric to evaluate the reliability of alternative MST solutions, we discover that they encompass a wide range of credibility values. On the basis of these observations, we conclude that current approaches to studying population structure using MSTs are inadequate. We instead propose a systematic approach to MST estimation that bases analyses on the optimal computation of an input distance matrix, provides information about the number and configurations of alternative MSTs, and allows identification of the most credible MST or MSTs by using a bootstrapping metric. It is our hope this algorithm will become the new “gold standard” approach for analyzing MSTs for molecular epidemiology so that this generally useful computational approach can be used informatively and to its full potential.

Keywords

This publication has 22 references indexed in Scilit:

Spoligotype-Based Comparative Population Structure Analysis of Multidrug-Resistant and Isoniazid-Monoresistant Mycobacterium tuberculosis Complex Clinical Isolates in Poland
Journal of Clinical Microbiology, 2010
Phylogeographical and molecular characterization of an emerging Mycobacterium tuberculosis clone in Trinidad and Tobago
Infection, Genetics and Evolution, 2009
Genetic epidemiology of the Sudden Oak Death pathogenPhytophthora ramorumin California
Molecular Ecology, 2009
Frequent emergence and limited geographic dispersal of methicillin-resistant Staphylococcus aureus
Proceedings of the National Academy of Sciences of the United States of America, 2008
Molecular epidemiology of pneumococci obtained from Gambian children aged 2–29 months with invasive pneumococcal disease during a trial of a 9-valent pneumococcal conjugate vaccine
BMC Infectious Diseases, 2008
Inference of population structure using multilocus genotype data: dominant markers and null alleles
Molecular Ecology Notes, 2007
Molecular Epidemiologic Investigation of Campylobacter coli in Swine Production Systems, Using Multilocus Sequence Typing
Applied and Environmental Microbiology, 2006
Mechanisms of tandem repeat instability in bacteria
Mutation research. Reviews in mutation research, 2006
Use of the Minimum Spanning Tree Model for Molecular Epidemiological Investigation of a Nosocomial Outbreak of Hepatitis C Virus Infection
Journal of Clinical Microbiology, 2004
Usefulness of Multilocus Sequence Typing for Characterization of Clinical Isolates of Candida albicans
Journal of Clinical Microbiology, 2002

Cited by 47 articles