A method to recognize distant repeats in protein sequences

1 December 1993

journal article
research article
Published by Wiley in Proteins-Structure Function and Bioinformatics

Vol. 17 (4), 391-411
https://doi.org/10.1002/prot.340170407

Abstract

An automated algorithm is presented that delineates protein sequence fragments which display similarity. The method incorporates a selection of a number of local nonoverlapping sequence alignments with the highest similarity scores and a graphtheoretical approach to elucidate the consistent start and end points of the fragments comprising one or more ensembles of related subsequences. The procedure allows the simultaneous identification of different types of repeats within one sequence. A multiple alignment of the resulting fragments is performed and a consensus sequence derived from the ensemble(s). Finally, a profile is constructed form the multiple alignment to detect possible and more distant members within the sequence. The method tolerates mutations in the repeats as well as insertions and deletions. The sequence spans between the various repeats or repeat clusters may be of different lengths. The technique has been applied to a number of proteins where the repeating fragments have been derived from information additional to the protein sequences.

Keywords

This publication has 42 references indexed in Scilit:

Sequence alignment approach to pick up conformationally similar protein fragments
Journal of Molecular Biology, 1992
Side-chain clusters in protein structures and their role in protein folding
Journal of Molecular Biology, 1991
A sensitive procedure to compare amino acid sequences
Journal of Molecular Biology, 1987
X-ray analysis of the eye lens protein γ-II crystallin at 1·9 Å resolution
Journal of Molecular Biology, 1983
Analysis of gene duplication repeats in the myosin rod
Journal of Molecular Biology, 1983
Gene duplications in the structural evolution of chymotrypsin
Journal of Molecular Biology, 1979
The protein data bank: A computer-based archival file for macromolecular structures
Journal of Molecular Biology, 1977
The 14-fold periodicity in α-tropomyosin and the interaction with actin
Journal of Molecular Biology, 1976
A molecular theory of lipid—protein interactions in the plasma lipoproteins
FEBS Letters, 1974
Tests for comparing related amino-acid sequences. Cytochrome c and cytochrome c551
Journal of Molecular Biology, 1971

Cited by 55 articles