Protein sequences classification by means of feature extraction with substitution matrices
Open Access
- 8 April 2010
- journal article
- research article
- Published by Springer Science and Business Media LLC in BMC Bioinformatics
- Vol. 11 (1), 175
- https://doi.org/10.1186/1471-2105-11-175
Abstract
This paper deals with the preprocessing of protein sequences for supervised classification. Motif extraction is one way to address that task. It has been largely used to encode biological sequences into feature vectors to enable using well-known machine-learning classifiers which require this format. However, designing a suitable feature space, for a set of proteins, is not a trivial task. For this purpose, we propose a novel encoding method that uses amino-acid substitution matrices to define similarity between motifs during the extraction step.Keywords
This publication has 44 references indexed in Scilit:
- Detection of protein catalytic residues at high precision using local network propertiesBMC Bioinformatics, 2008
- Structural descriptor database: a new tool for sequence-based functional site predictionBMC Bioinformatics, 2008
- Feature selection environment for genomic applicationsBMC Bioinformatics, 2008
- Prediction of protein homo-oligomer types by pseudo amino acid composition: Approached with an improved feature extraction and Naive Bayes Feature FusionAmino Acids, 2006
- Boosting classifier for predicting protein domain structural classBiochemical and Biophysical Research Communications, 2005
- Predicting protein quaternary structure by pseudo amino acid compositionProteins-Structure Function and Bioinformatics, 2003
- The Protein Data BankNucleic Acids Research, 2000
- Prediction of Protein Structural ClassesCritical Reviews in Biochemistry and Molecular Biology, 1995
- Basic local alignment search toolJournal of Molecular Biology, 1990