An Improved Deep Learning Model for Predicting DNA Sequence Function

Abstract
Since a complete DNA chain contains a large data (usually billions of nucleotides), it’s challenging to figure out the function of each sequence segment. Several powerful predictive models for the function of DNA sequence, including, CNN (convolutional neural network), RNN (recurrent neural network), and LSTM [1] (long short-term memory) have been proposed. However, all of them have some flaws. For example, the RNN can hardly have long-term memory. Here, we build on one of these models, DanQ, which uses CNN and LSTM together. We extend DanQ by developing an improved DanQ model and applying it to predict the function of DNA sequence more efficiently. In the most primitive DanQ model, the regulatory grammar is learned by the regulatory motifs captured by the convolution layer and the long-term dependencies between the motifs captured by the recurrent layer, so as to increase the prediction accuracy. Through the testing of some models, DanQ has greatly improved in some indicators. For the regulatory markers, DanQ achieves improvements above 50% of the area under the curve, via the measurement of the precision-recall curve.

This publication has 1 reference indexed in Scilit: