On the (Im)possibility to Reconstruct Plasmids from Whole Genome Short-Read Sequencing Data
Open Access
- 14 November 2016
- preprint content
- research article
- Published by Cold Spring Harbor Laboratory
- p. 086744
- https://doi.org/10.1101/086744
Abstract
Plasmids are autonomous extra-chromosomal elements in bacterial cells that can carry genes that are important for bacterial survival. To benchmark algorithms for automated plasmid sequence reconstruction from short read sequencing data, we selected 42 publicly available complete bacterial genome sequences which were assembled by a combination of long- and short-read data. The selected bacterial genome sequence projects span 12 genera, containing 148 plasmids. We predicted plasmids from short-read data with four different programs (PlasmidSPAdes, Recycler, cBar and PlasmidFinder) and compared the outcome to the reference sequences. PlasmidSPAdes reconstructs plasmids based on coverage differences in the assembly graph. It reconstructed most of the reference plasmids (recall = 0.82) but approximately a quarter of the predicted plasmid contigs were false positives (precision = 0.76). PlasmidSPAdes merged 83 % of the predictions from genomes with multiple plasmids in a single bin. Recycler searches the assembly graph for sub-graphs corresponding to circular sequences and correctly predicted small plasmids but failed with long plasmids (recall = 0.12, precision = 0.30). cBar, which applies pentamer frequency composition analysis to detect plasmid-derived contigs, showed an overall recall and precision of 0.78 and 0.64. However, cBar only categorizes contigs as plasmid-derived and does not bin the different plasmids correctly within a bacterial isolate. PlasmidFinder, which searches for matches in a replicon database, had the highest precision (1.0) but was restricted by the contents of its database and the contig length obtained from de novo assembly (recall = 0.36). Surprisingly, PlasmidSPAdes and Recycler detected single isolated components corresponding to putative novel small plasmids (50 kbp) containing repeated sequences remains challenging and limits the high-throughput analysis of WGS data.Keywords
This publication has 33 references indexed in Scilit:
- Recycler: an algorithm for detecting plasmids from de novo assembly graphsBioinformatics, 2016
- Multilevel population genetic analysis ofvanAandvanB Enterococcus faeciumcausing nosocomial outbreaks in 27 countries (1986–2012)Journal of Antimicrobial Chemotherapy, 2016
- Plasmid and Host Strain Characteristics of Escherichia coli Resistant to Extended-Spectrum Cephalosporins in the Norwegian Broiler ProductionPLOS ONE, 2016
- plasmidSPAdes: Assembling Plasmids from Whole Genome Sequencing DataPublished by Cold Spring Harbor Laboratory ,2016
- Small-Plasmid-Mediated Antibiotic Resistance Is Enhanced by Increases in Plasmid Copy Number and Bacterial FitnessAntimicrobial Agents and Chemotherapy, 2015
- Plasmid Detection, Characterization, and EcologyMicrobiology Spectrum, 2015
- Plasmid Diversity and Adaptation Analyzed by Massive Sequencing of Escherichia coli PlasmidsMicrobiology Spectrum, 2014
- Single-molecule sequencing to track plasmid diversity of hospital-associated carbapenemase-producing EnterobacteriaceaeScience Translational Medicine, 2014
- In Silico Detection and Typing of Plasmids using PlasmidFinder and Plasmid Multilocus Sequence TypingAntimicrobial Agents and Chemotherapy, 2014
- cBar: a computer program to distinguish plasmid-derived from chromosome-derived sequence fragments in metagenomics dataBioinformatics, 2010