Contact-based sequence alignment

28 April 2004

journal article
research article
Published by Oxford University Press (OUP) in Nucleic Acids Research

Vol. 32 (8), 2464-2473
https://doi.org/10.1093/nar/gkh566

Abstract

This paper introduces the novel method of contact-based protein sequence alignment, where structural information in the form of contact mutation probabilities is incorporated into an alignment routine using contact-mutation matrices (CAO: Contact Accepted mutatiOn). The contact-based alignment routine optimizes the score of matched contacts, which involves four (two per contact) instead of two residues per match in pairwise alignments. The first contact refers to a real side-chain contact in a template sequence with known structure, and the second contact is the equivalent putative contact of a homologous query sequence with unknown structure. An algorithm has been devised to perform a pairwise sequence alignment based on contact information. The contact scores were combined with PAM-type (Point Accepted Mutation) substitution scores after parameterization of gap penalties and score weights by means of a genetic algorithm. We show that owing to the structural information contained in the CAO matrices, significantly improved alignments of distantly related sequences can be obtained. This has allowed us to annotate eight putative Drosophila IGF sequences. Contact-based sequence alignment should therefore prove useful in comparative modelling and fold recognition.

Keywords

This publication has 36 references indexed in Scilit:

T-coffee: a novel method for fast and accurate multiple sequence alignment 1 1Edited by J. Thornton
Journal of Molecular Biology, 2000
The Genome Sequence of Drosophila melanogaster
Science, 2000
Dynamic sequence databank searching with templates and multiple alignment
Journal of Molecular Biology, 1998
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Research, 1997
Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and bayesian scoring functions
Journal of Molecular Biology, 1997
SWISS‐MODEL and the Swiss‐Pdb Viewer: An environment for comparative protein modeling
Electrophoresis, 1997
Significant Improvement in Accuracy of Multiple Protein Sequence Alignments by Iterative Refinement as Assessed by Reference to Structural Alignments
Journal of Molecular Biology, 1996
Multiple Gene Copies for Bombyxin, an Insulin-related Peptide of the SilkmothBombyx mori: Structural Signs for Gene Rearrangement and Duplication Responsible for Generation of Multiple Molecular Forms of Bombyxin
Journal of Molecular Biology, 1996
An Assessment of Amino Acid Exchange Matrices in Aligning Protein Sequences: The Twilight Zone Revisited
Journal of Molecular Biology, 1995
CLUSTAL: a package for performing multiple sequence alignment on a microcomputer
Gene, 1988

Cited by 23 articles