A parallel graph decomposition algorithm for DNA sequencing with nanopores

Open Access

11 November 2004

journal article
research article
Published by Oxford University Press (OUP) in Bioinformatics

Vol. 21 (7), 889-896
https://doi.org/10.1093/bioinformatics/bti129

Abstract

Motivation: With the potential availability of nanopore devices that can sense the bases of translocating single-stranded DNA (ssDNA), it is likely that ‘reads’ of length ∼10⁵ will be available in large numbers and at high speed. We address the problem of complete DNA sequencing using such reads. We assume that ∼10² copies of a DNA sequence are split into single strands that break into randomly sized pieces as they translocate the nanopore in arbitrary orientations. The nanopore senses and reports each individual base that passes through, but all information about orientation and complementarity of the ssDNA subsequences is lost. Random errors (both biological and transduction) in the reads create further complications. Results: We have developed an algorithm that addresses these issues. It can be considered an extreme variation of the well-known Eulerian path approach. It searches over a space of de Bruijn graphs until it finds one in which (a) the impact of errors is eliminated and (b) both possible orientations of the two ssDNA sequences can be identified separately and unambiguously. Our algorithm is able to correctly reconstruct real DNA sequences of the order of 10⁶ bases (e.g. the bacterium Mycoplasma pneumoniae) from simulated erroneous reads on a modest workstation in about 1 h. We describe, and give measured timings of, a parallel implementation of this algorithm on the Cray Multithreaded Architecture (MTA-2) supercomputer, whose architecture is ideally suited to this ‘unstructured’ problem. Our parallel implementation is crucial to the problem of rapidly sequencing long DNA sequences and also to the situation where multiple nanopores are used to obtain a high-bandwidth stream of reads. Contact:shb@acm.org

This publication has 16 references indexed in Scilit:

Sequence alignment on the Cray MTA‐2
Concurrency and Computation: Practice and Experience, 2004
DNA molecules and configurations in a solid-state nanopore microscope
Nature Materials, 2003
Fabrication of solid-state nanopores with single-nanometre precision
Nature Materials, 2003
Unzipping Kinetics of Double-Stranded DNA in a Nanopore
Physical Review Letters, 2003
Gene Duplication and Evolution
Science, 2002
Recent Segmental Duplications in the Human Genome
Science, 2002
Using Nanopores to Discriminate between Single Molecules of DNA
Published by Springer Science and Business Media LLC ,2002
Characterization of individual polynucleotide molecules using a membrane channel
Proceedings of the National Academy of Sciences of the United States of America, 1996
A New Algorithm for DNA Sequence Assembly
Journal of Computational Biology, 1995
The Tera computer system
ACM SIGARCH Computer Architecture News, 1990

Cited by 11 articles