Using affinity propagation for identifying subspecies among clonal organisms: lessons from M. tuberculosis
Open Access
- 2 June 2011
- journal article
- Published by Springer Science and Business Media LLC in BMC Bioinformatics
- Vol. 12 (1), 224
- https://doi.org/10.1186/1471-2105-12-224
Abstract
Classification and naming is a key step in the analysis, understanding and adequate management of living organisms. However, where to set limits between groups can be puzzling especially in clonal organisms. Within the Mycobacterium tuberculosis complex (MTC), the etiological agent of tuberculosis (TB), experts have first identified several groups according to their pattern at repetitive sequences, especially at the CRISPR locus (spoligotyping), and to their epidemiological relevance. Most groups such as "Beijing" found good support when tested with other loci. However, other groups such as T family and T1 subfamily (belonging to the "Euro-American" lineage) correspond to non-monophyletic groups and still need to be refined. Here, we propose to use a method called Affinity Propagation that has been successfully used in image categorization to identify relevant patterns at the CRISPR locus in MTC. To adequately infer the relative divergence time between strains, we used a distance method inspired by the recent evolutionary model by Reyes et al. We first confirm that this method performs better than the Jaccard index commonly used to compare spoligotype patterns. Second, we document the support of each spoligotype family among the previous classification using affinity propagation on the international spoligotyping database SpolDB4. This allowed us to propose a consensus assignation for all SpolDB4 spoligotypes. Third, we propose new signatures to subclassify the T family. Altogether, this study shows how the new clustering algorithm Affinity Propagation can help building or refining clonal organims classifications. It also describes well-supported families and subfamilies among M. tuberculosis complex, especially inside the modern "Euro-American" lineage.Keywords
This publication has 40 references indexed in Scilit:
- Novel Virulence Gene and Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR) Multilocus Sequence Typing Scheme for Subtyping of the Major Serovars of Salmonella enterica subsp. entericaApplied and Environmental Microbiology, 2011
- Resolving lineage assignation on Mycobacterium tuberculosis clinical isolates classified by spoligotyping with a new high-throughput 3R SNPs based methodInfection, Genetics and Evolution, 2010
- A conformal Bayesian network for classification of Mycobacterium tuberculosis complex lineagesBMC Bioinformatics, 2010
- Tuberculosis in Dr Granville's mummy: a molecular re-examination of the earliest known Egyptian mummy to be scientifically examined and given a medical diagnosisProceedings. Biological sciences, 2009
- First Insight into Genetic Diversity of the Mycobacterium tuberculosis Complex in Albania Obtained by Multilocus Variable-Number Tandem-Repeat Analysis and Spoligotyping Reveals the Presence of Beijing Multidrug-Resistant IsolatesJournal of Clinical Microbiology, 2009
- High Functional Diversity in Mycobacterium tuberculosis Driven by Genetic Drift and Human DemographyPLoS Biology, 2008
- Models of deletion for visualizing bacterial variation: an application to tuberculosis spoligotypesBMC Bioinformatics, 2008
- Evolution and Diversity of Clonal Bacteria: The Paradigm of Mycobacterium tuberculosisPLOS ONE, 2008
- Assessment of Mycobacterial Interspersed Repetitive Unit-QUB Markers To Further Discriminate the Beijing Genotype in a Population-Based Study of the Genetic Diversity of Mycobacterium tuberculosis Clinical Isolates from Okinawa, Ryukyu Islands, JapanJournal of Clinical Microbiology, 2007
- Microsatellites: simple sequences with complex evolutionNature Reviews Genetics, 2004