Fast and efficient Rmap assembly using the Bi-labelled de Bruijn graph
Open Access
- 25 May 2021
- journal article
- research article
- Published by Springer Science and Business Media LLC in Algorithms for Molecular Biology
- Vol. 16 (1), 1-13
- https://doi.org/10.1186/s13015-021-00182-9
Abstract
Genome wide optical maps are high resolution restriction maps that give a unique numeric representation to a genome. They are produced by assembling hundreds of thousands of single molecule optical maps, which are called Rmaps. Unfortunately, there are very few choices for assembling Rmap data. There exists only one publicly-available non-proprietary method for assembly and one proprietary software that is available via an executable. Furthermore, the publicly-available method, by Valouev et al. (Proc Natl Acad Sci USA 103(43):15770–15775, 2006), follows the overlap-layout-consensus (OLC) paradigm, and therefore, is unable to scale for relatively large genomes. The algorithm behind the proprietary method, Bionano Genomics’ Solve, is largely unknown. In this paper, we extend the definition of bi-labels in the paired de Bruijn graph to the context of optical mapping data, and present the first de Bruijn graph based method for Rmap assembly. We implement our approach, which we refer to as rmapper, and compare its performance against the assembler of Valouev et al. (Proc Natl Acad Sci USA 103(43):15770–15775, 2006) and Solve by Bionano Genomics on data from three genomes: E. coli, human, and climbing perch fish (Anabas Testudineus). Our method was able to successfully run on all three genomes. The method of Valouev et al. (Proc Natl Acad Sci USA 103(43):15770–15775, 2006) only successfully ran on E. coli. Moreover, on the human genome rmapper was at least 130 times faster than Bionano Solve, used five times less memory and produced the highest genome fraction with zero mis-assemblies. Our software, rmapper is written in C++ and is publicly available under GNU General Public License at https://github.com/kingufl/Rmapper.Keywords
Funding Information
- National Science Foundation (1618814)
- Academy of Finland (308030, 314170)
- Academy of Finland (323233)
This publication has 39 references indexed in Scilit:
- SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell SequencingJournal of Computational Biology, 2012
- IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depthBioinformatics, 2012
- Paired de Bruijn Graphs: A Novel Approach for Incorporating Mate Pair Information into Genome AssemblersJournal of Computational Biology, 2011
- High-resolution human genome structure by single-molecule analysisProceedings of the National Academy of Sciences of the United States of America, 2010
- Lineage-Specific Biology Revealed by a Finished Genome Assembly of the MousePLoS Biology, 2009
- ABySS: A parallel assembler for short read sequence dataGenome Research, 2009
- Velvet: Algorithms for de novo short read assembly using de Bruijn graphsGenome Research, 2008
- Validation of rice genome sequence by optical mappingBMC Genomics, 2007
- An algorithm for assembly of ordered restriction maps from single DNA moleculesProceedings of the National Academy of Sciences of the United States of America, 2006
- A New Algorithm for DNA Sequence AssemblyJournal of Computational Biology, 1995