Finding Genes in DNA with a Hidden Markov Model

Abstract
This study describes a new Hidden Markov Model (HMM) system for segmenting uncharacterized genomic DNA sequences into exons, introns, and intergenic regions. Separate HMM modules were designed and trained for specific regions of DNA: exons, introns, intergenic regions, and splice sites. The models were then tied together to form a biologically feasible topology. The integrated HMM was trained further on a set of eukaryotic DNA sequences and tested by using it to segment a separate set of sequences. The resulting HMM system which is called VEIL (Viterbi Exon-Intron Locator), obtains an overall accuracy on test data of 92% of total bases correctly labelled, with a correlation coefficient of 0.73. Using the more stringent test of exact exon prediction, VEIL correctly located both ends of 53% of the coding exons, and 49% of the exons it predicts are exactly correct. These results compare favorably to the best previous results for gene structure prediction and demonstrate the benefits of using HMMs for this problem.

This publication has 15 references indexed in Scilit: