Self‐organizing hierarchic networks for pattern recognition in protein sequence
Open Access
- 1 January 1996
- journal article
- research article
- Published by Wiley in Protein Science
- Vol. 5 (1), 72-82
- https://doi.org/10.1002/pro.5560050109
Abstract
We present a method based on hierarchical self-organizing maps (SOMs) for recognizing patterns in protein sequences. The method is fully automatic, does not require prealigned sequences, is insensitive to redundancy in the training set, and works surprisingly well even with small learning sets. Because it uses unsupervised neural networks, it is able to extract patterns that are not present in all of the unaligned sequences of the learning set. The identification of these patterns in sequence databases is sensitive and efficient. The procedure comprises three main training stages. In the first stage, one SOM is trained to extract common features from the set of unaligned learning sequences. A feature is a number of ungapped sequence segments (usually 4–16 residues long) that are similar to segments in most of the sequences of the learning set according to an initial similarity matrix. In the second training stage, the recognition of each individual feature is refined by selecting an optimal weighting matrix out of a variety of existing amino acid similarity matrices. In a third stage of the SOM procedure, the position of the features in the individual sequences is learned. This allows for variants with feature repeats and feature shuffling. The procedure has been successfully applied to a number of notoriously difficult cases with distinct recognition problems: helix-turn-helix motifs in DNA-binding proteins, the CUB domain of developmentally regulated proteins, and the superfamily of ribokinases. A comparison with the established database search procedure PROFILE (and with several others) led to the conclusion that the new automatic method performs satisfactorily.Keywords
Funding Information
- Research Program “Neurogen”
- Federal Ministry of Research and Technology
This publication has 45 references indexed in Scilit:
- Detecting Patterns in Protein SequencesJournal of Molecular Biology, 1994
- The CUB DomainJournal of Molecular Biology, 1993
- Recognition of distantly related protein sequences using conserved motifs and neural networksJournal of Molecular Biology, 1992
- TRANSCRIPTION FACTORS: Structural Families and Principles of DNA RecognitionAnnual Review of Biochemistry, 1992
- Amino acid similarity coefficients for protein modeling and sequence alignment derived from main-chain folding anglesJournal of Molecular Biology, 1991
- Basic local alignment search toolJournal of Molecular Biology, 1990
- Protein secondary structure and homology by neural networks The α‐helices in rhodopsinFEBS Letters, 1988
- Predicting the secondary structure of globular proteins using neural network modelsJournal of Molecular Biology, 1988
- Multiple sequence alignmentJournal of Molecular Biology, 1986
- Identification of protein sequence homology by consensus template alignmentJournal of Molecular Biology, 1986