IIMLP: integrated information-entropy-based method for LncRNA prediction
Open Access
- 13 May 2021
- journal article
- research article
- Published by Springer Science and Business Media LLC in BMC Bioinformatics
- Vol. 22 (S3), 1-12
- https://doi.org/10.1186/s12859-020-03884-w
Abstract
The prediction of long non-coding RNA (lncRNA) has attracted great attention from researchers, as more and more evidence indicate that various complex human diseases are closely related to lncRNAs. In the era of bio-med big data, in addition to the prediction of lncRNAs by biological experimental methods, many computational methods based on machine learning have been proposed to make better use of the sequence resources of lncRNAs. We developed the lncRNA prediction method by integrating information-entropy-based features and machine learning algorithms. We calculate generalized topological entropy and generate 6 novel features for lncRNA sequences. By employing these 6 features and other features such as open reading frame, we apply supporting vector machine, XGBoost and random forest algorithms to distinguish human lncRNAs. We compare our method with the one which has more K-mer features and results show that our method has higher area under the curve up to 99.7905%. We develop an accurate and efficient method which has novel information entropy features to analyze and classify lncRNAs. Our method is also extendable for research on the other functional elements in DNA sequences.Keywords
Funding Information
- startup grant of Harbin Institute of Technology Shenzhen
- the National “863” Key Basic Research Development Program (2014AA021505)
- National Natural Science Foundation of China (61702134)
- the Shenzhen stable support program
This publication has 29 references indexed in Scilit:
- Joint probabilistic-logical refinement of multiple protein feature predictorsBMC Bioinformatics, 2014
- miR-375 regulates rat alveolar epithelial cell trans-differentiation by inhibiting Wnt/ -catenin pathwayNucleic Acids Research, 2013
- CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression modelNucleic Acids Research, 2013
- Non-coding RNAs in human diseaseNature Reviews Genetics, 2011
- Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclassesGenes & Development, 2011
- Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastasisNature, 2010
- Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammalsNature, 2009
- The relationship between non‐protein‐coding DNA and eukaryotic complexityBioEssays, 2007
- The random subspace method for constructing decision forestsIEEE Transactions on Pattern Analysis and Machine Intelligence, 1998
- On Information and SufficiencyThe Annals of Mathematical Statistics, 1951