Modeling protein cores with Markov random fields

31 December 1994

journal article
Published by Elsevier BV in Mathematical Biosciences

Vol. 124 (2), 149-179
https://doi.org/10.1016/0025-5564(94)90041-8

Abstract

A mathematical formalism is introduced that has general applicability to many protein structure models used in the various approaches to the “inverse protein folding problem.” The inverse nature of the problem arises from the fact that one begins with a set of assumed tertiary structures and searches for those most compatible with a new sequence, rather than attempting to predict the structure directly from the new sequence. The formalism is based on the well-known theory of Markov random fields (MRFs). Our MRF formulation provides explicit representations for the relevant amino acid position environments and the physical topologies of the structural contacts. In particular, MRF models can readily be constructed for the secondary structure packing topologies found in protein domain cores, or other structural motifs, that are anticipated to be common among large sets of both homologous and nonhomologous proteins. MRF models are probabilistic and can exploit the statistical data from the limited number of proteins having known domain structures. The MRF approach leads to a new scoring function for comparing different threadings (placements) of a sequence through different structure models. The scoring function is very important, because comparing alternative structure models with each other is a key step in the inverse folding problem. Unlike previously published scoring functions, the one derived in this paper is based on a comprehensive probabilistic formulation of the threading problem.

Keywords

This publication has 26 references indexed in Scilit:

Protein classification by stochastic modeling and optimal filtering of amino-acid sequences
Mathematical Biosciences, 1994
Prediction of Protein Structure by Evaluation of Sequence-structure Fitness: Aligning Sequences to Contact Profiles Derived from Three-dimensional Structures
Journal of Molecular Biology, 1993
Contact potential that recognizes the correct folding of globular proteins
Journal of Molecular Biology, 1992
Topology fingerprint approach to the inverse protein folding problem
Journal of Molecular Biology, 1992
A new approach to protein fold recognition
Nature, 1992
One thousand families for the molecular biologist
Nature, 1992
Assessment of protein models with three-dimensional profiles
Nature, 1992
Identification of native protein folds amongst a large number of incorrect models: The calculation of low energy conformations from potentials of mean force
Journal of Molecular Biology, 1990
Calculation of conformational ensembles from potentials of mena force
Journal of Molecular Biology, 1990
The protein data bank: A computer-based archival file for macromolecular structures
Journal of Molecular Biology, 1977

Cited by 18 articles