Combined Use of k-Mer Numerical Features and Position-Specific Categorical Features in Fixed-Length DNA Sequence Classification
Open Access
- 1 January 2017
- journal article
- research article
- Published by Scientific Research Publishing, Inc. in Journal of Biomedical Science and Engineering
- Vol. 10 (08), 390-401
- https://doi.org/10.4236/jbise.2017.108030
Abstract
To classify DNA sequences, k-mer frequency is widely used since it can convert variable-length sequences into fixed-length and numerical feature vectors. However, in case of fixed-length DNA sequence classification, subsequences starting at a specific position of the given sequence can also be used as categorical features. Through the performance evaluation on six datasets of fixed-length DNA sequences, our algorithm based on the above idea achieved comparable or better performance than other state-of-the art algorithms.Keywords
This publication has 14 references indexed in Scilit:
- Using deformation energy to analyze nucleosome positioning in genomesGenomics, 2016
- LAF: Logic Alignment Free and its application to bacterial genomes classificationBioData Mining, 2015
- Nearest neighbor classification of categorical data by attributes weightingExpert Systems with Applications, 2015
- Integrating alignment-based and alignment-free sequence similarity measures for biological sequence classificationBioinformatics, 2015
- iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide compositionBioinformatics, 2014
- Efficient Feature Selection and Classification of Protein Sequence Data in BioinformaticsThe Scientific World Journal, 2014
- A brief survey on sequence classificationACM SIGKDD Explorations Newsletter, 2010
- Genome-wide Map of Nucleosome Acetylation and Methylation in YeastCell, 2005
- kernlab- AnS4Package for Kernel Methods inRJournal of Statistical Software, 2004
- Using Rules to Analyse Bio-medical Data: A Comparison between C4.5 and PCLLecture Notes in Computer Science, 2003