Optimal structure for automatic processing of DNA sequences

1 January 1999

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Biomedical Engineering

Vol. 46 (9), 1044-1056
https://doi.org/10.1109/10.784135

Abstract

The faithful recovery of the base sequence in automatic DeoxyriboNucleic Acid (DNA) sequencing fundamentally depends on the underlying statistics of the DNA electrophoresis time series. Current DNA sequencing algorithms are heuristic in nature and modest in their use of statistical information. In this paper, a formal statistical model of the DNA time series is presented and then used to construct the optimal maximum-likelihood (ML) processor. The DNA-ML algorithm that is derived in this paper features Kalman prediction of peak locations, peak parameter estimation, whitened waveform comparison and multiple hypothesis processing using the M-algorithm. Properties of the algorithm are examined using both simulated and real data. Model parameters of critical importance and their impact on different types of error mechanisms, such as insertions and deletions, are pointed out. The statistical model of the DNA time-series and the structure of the DNA-ML algorithm provides a basis for future investigation and refinement of DNA sequencing techniques.

Keywords

This publication has 15 references indexed in Scilit:

A graph theoretic approach to the analysis of DNA sequencing data.
Genome Research, 1996
Innovations-based MLSE for Rayleigh fading channels
IEEE Transactions on Communications, 1995
An automated film reader for DNA sequencing based on homomorphic deconvolution
IEEE Transactions on Biomedical Engineering, 1994
Neural Networks for Automated Base-calling of Gel-based DNA Sequencing Ladders
Published by Elsevier BV ,1994
Digital Communication
Published by Springer Science and Business Media LLC ,1994
An adaptive, object oriented strategy for base calling in DNA sequence analysis
Nucleic Acids Research, 1993
Optimum delay and sequence estimation from incomplete data
IEEE Transactions on Information Theory, 1990
Sequential Coding Algorithms: A Survey and Cost Analysis
IEEE Transactions on Communications, 1984
Synchronization Problems in PAM Systems
IEEE Transactions on Communications, 1980
Optimal reception of digital data over the Gaussian channel with unknown delay and phase jitter
IEEE Transactions on Information Theory, 1977

Cited by 15 articles