On the (Im)possibility to Reconstruct Plasmids from Whole Genome Short-Read Sequencing Data

Open Access

14 November 2016

preprint content
research article
Published by Cold Spring Harbor Laboratory

p. 086744
https://doi.org/10.1101/086744

Abstract

Plasmids are autonomous extra-chromosomal elements in bacterial cells that can carry genes that are important for bacterial survival. To benchmark algorithms for automated plasmid sequence reconstruction from short read sequencing data, we selected 42 publicly available complete bacterial genome sequences which were assembled by a combination of long- and short-read data. The selected bacterial genome sequence projects span 12 genera, containing 148 plasmids. We predicted plasmids from short-read data with four different programs (PlasmidSPAdes, Recycler, cBar and PlasmidFinder) and compared the outcome to the reference sequences. PlasmidSPAdes reconstructs plasmids based on coverage differences in the assembly graph. It reconstructed most of the reference plasmids (recall = 0.82) but approximately a quarter of the predicted plasmid contigs were false positives (precision = 0.76). PlasmidSPAdes merged 83 % of the predictions from genomes with multiple plasmids in a single bin. Recycler searches the assembly graph for sub-graphs corresponding to circular sequences and correctly predicted small plasmids but failed with long plasmids (recall = 0.12, precision = 0.30). cBar, which applies pentamer frequency composition analysis to detect plasmid-derived contigs, showed an overall recall and precision of 0.78 and 0.64. However, cBar only categorizes contigs as plasmid-derived and does not bin the different plasmids correctly within a bacterial isolate. PlasmidFinder, which searches for matches in a replicon database, had the highest precision (1.0) but was restricted by the contents of its database and the contig length obtained from de novo assembly (recall = 0.36). Surprisingly, PlasmidSPAdes and Recycler detected single isolated components corresponding to putative novel small plasmids (50 kbp) containing repeated sequences remains challenging and limits the high-throughput analysis of WGS data.

Keywords

This publication has 33 references indexed in Scilit:

Recycler: an algorithm for detecting plasmids from de novo assembly graphs
Bioinformatics, 2016
Multilevel population genetic analysis ofvanAandvanB Enterococcus faeciumcausing nosocomial outbreaks in 27 countries (1986–2012)
Journal of Antimicrobial Chemotherapy, 2016
Plasmid and Host Strain Characteristics of Escherichia coli Resistant to Extended-Spectrum Cephalosporins in the Norwegian Broiler Production
PLOS ONE, 2016
plasmidSPAdes: Assembling Plasmids from Whole Genome Sequencing Data
Published by Cold Spring Harbor Laboratory ,2016
Small-Plasmid-Mediated Antibiotic Resistance Is Enhanced by Increases in Plasmid Copy Number and Bacterial Fitness
Antimicrobial Agents and Chemotherapy, 2015
Plasmid Detection, Characterization, and Ecology
Microbiology Spectrum, 2015
Plasmid Diversity and Adaptation Analyzed by Massive Sequencing of Escherichia coli Plasmids
Microbiology Spectrum, 2014
Single-molecule sequencing to track plasmid diversity of hospital-associated carbapenemase-producing Enterobacteriaceae
Science Translational Medicine, 2014
In Silico Detection and Typing of Plasmids using PlasmidFinder and Plasmid Multilocus Sequence Typing
Antimicrobial Agents and Chemotherapy, 2014
cBar: a computer program to distinguish plasmid-derived from chromosome-derived sequence fragments in metagenomics data
Bioinformatics, 2010

Cited by 12 articles