Optimal structure for automatic processing of DNA sequences
- 1 January 1999
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Biomedical Engineering
- Vol. 46 (9), 1044-1056
- https://doi.org/10.1109/10.784135
Abstract
The faithful recovery of the base sequence in automatic DeoxyriboNucleic Acid (DNA) sequencing fundamentally depends on the underlying statistics of the DNA electrophoresis time series. Current DNA sequencing algorithms are heuristic in nature and modest in their use of statistical information. In this paper, a formal statistical model of the DNA time series is presented and then used to construct the optimal maximum-likelihood (ML) processor. The DNA-ML algorithm that is derived in this paper features Kalman prediction of peak locations, peak parameter estimation, whitened waveform comparison and multiple hypothesis processing using the M-algorithm. Properties of the algorithm are examined using both simulated and real data. Model parameters of critical importance and their impact on different types of error mechanisms, such as insertions and deletions, are pointed out. The statistical model of the DNA time-series and the structure of the DNA-ML algorithm provides a basis for future investigation and refinement of DNA sequencing techniques.Keywords
This publication has 15 references indexed in Scilit:
- A graph theoretic approach to the analysis of DNA sequencing data.Genome Research, 1996
- Innovations-based MLSE for Rayleigh fading channelsIEEE Transactions on Communications, 1995
- An automated film reader for DNA sequencing based on homomorphic deconvolutionIEEE Transactions on Biomedical Engineering, 1994
- Neural Networks for Automated Base-calling of Gel-based DNA Sequencing LaddersPublished by Elsevier BV ,1994
- Digital CommunicationPublished by Springer Science and Business Media LLC ,1994
- An adaptive, object oriented strategy for base calling in DNA sequence analysisNucleic Acids Research, 1993
- Optimum delay and sequence estimation from incomplete dataIEEE Transactions on Information Theory, 1990
- Sequential Coding Algorithms: A Survey and Cost AnalysisIEEE Transactions on Communications, 1984
- Synchronization Problems in PAM SystemsIEEE Transactions on Communications, 1980
- Optimal reception of digital data over the Gaussian channel with unknown delay and phase jitterIEEE Transactions on Information Theory, 1977