Protein sequences classification by means of feature extraction with substitution matrices

Open Access

8 April 2010

journal article
research article
Published by Springer Science and Business Media LLC in BMC Bioinformatics

Vol. 11 (1), 175
https://doi.org/10.1186/1471-2105-11-175

Abstract

This paper deals with the preprocessing of protein sequences for supervised classification. Motif extraction is one way to address that task. It has been largely used to encode biological sequences into feature vectors to enable using well-known machine-learning classifiers which require this format. However, designing a suitable feature space, for a set of proteins, is not a trivial task. For this purpose, we propose a novel encoding method that uses amino-acid substitution matrices to define similarity between motifs during the extraction step.

Keywords

This publication has 44 references indexed in Scilit:

Detection of protein catalytic residues at high precision using local network properties
BMC Bioinformatics, 2008
Structural descriptor database: a new tool for sequence-based functional site prediction
BMC Bioinformatics, 2008
Feature selection environment for genomic applications
BMC Bioinformatics, 2008
Prediction of protein homo-oligomer types by pseudo amino acid composition: Approached with an improved feature extraction and Naive Bayes Feature Fusion
Amino Acids, 2006
Boosting classifier for predicting protein domain structural class
Biochemical and Biophysical Research Communications, 2005
Predicting protein quaternary structure by pseudo amino acid composition
Proteins-Structure Function and Bioinformatics, 2003
The Protein Data Bank
Nucleic Acids Research, 2000
Prediction of Protein Structural Classes
Critical Reviews in Biochemistry and Molecular Biology, 1995
Basic local alignment search tool
Journal of Molecular Biology, 1990

Cited by 56 articles