Combined Use of k-Mer Numerical Features and Position-Specific Categorical Features in Fixed-Length DNA Sequence Classification

Open Access

1 January 2017

journal article
research article
Published by Scientific Research Publishing, Inc. in Journal of Biomedical Science and Engineering

Vol. 10 (08), 390-401
https://doi.org/10.4236/jbise.2017.108030

Abstract

To classify DNA sequences, k-mer frequency is widely used since it can convert variable-length sequences into fixed-length and numerical feature vectors. However, in case of fixed-length DNA sequence classification, subsequences starting at a specific position of the given sequence can also be used as categorical features. Through the performance evaluation on six datasets of fixed-length DNA sequences, our algorithm based on the above idea achieved comparable or better performance than other state-of-the art algorithms.

Keywords

This publication has 14 references indexed in Scilit:

Using deformation energy to analyze nucleosome positioning in genomes
Genomics, 2016
LAF: Logic Alignment Free and its application to bacterial genomes classification
BioData Mining, 2015
Nearest neighbor classification of categorical data by attributes weighting
Expert Systems with Applications, 2015
Integrating alignment-based and alignment-free sequence similarity measures for biological sequence classification
Bioinformatics, 2015
iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition
Bioinformatics, 2014
Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics
The Scientific World Journal, 2014
A brief survey on sequence classification
ACM SIGKDD Explorations Newsletter, 2010
Genome-wide Map of Nucleosome Acetylation and Methylation in Yeast
Cell, 2005
kernlab- AnS4Package for Kernel Methods inR
Journal of Statistical Software, 2004
Using Rules to Analyse Bio-medical Data: A Comparison between C4.5 and PCL
Lecture Notes in Computer Science, 2003

Cited by 11 articles