Modeling and base-calling for Dna Sequencing-By-Synthesis

Abstract
The process of DNA sequencing-by-synthesis and its non-idealities are modeled as a noisy switched linear system parameterized by the unknown DNA sequence. The base-calling problem is then formulated as a parameter detection problem. As this system can have long memory, performing exact maximum-likelihood decoding is computationally prohibitive. An approximate ML method applied to experimental Pyrosequencing data demonstrates reliable read lengths exceeding 200 bases, which is significantly longer than that achieved by current methods