Homolog detection using global sequence properties suggests an alternate view of structural encoding in protein sequences
- 24 March 2014
- journal article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences of the United States of America
- Vol. 111 (14), 5225-5229
- https://doi.org/10.1073/pnas.1403599111
Abstract
We show that a Fourier-based sequence distance function is able to identify structural homologs of target sequences with high accuracy. It is shown that Fourier distances correlate very strongly with independently determined structural distances between molecules, a property of the method that is not attainable using conventional representations. It is further shown that the ability of the Fourier approach to identify protein folds is statistically far in excess of random expectation. It is then shown that, in actual searches for structural homologs of selected target sequences, the Fourier approach gives excellent results. On the basis of these results, we suggest that the global information detected by the Fourier representation is an essential feature of structure encoding in protein sequences and a key to structural homology detection.Keywords
This publication has 21 references indexed in Scilit:
- Sequence determinants of protein architectureProteins-Structure Function and Bioinformatics, 2013
- Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templatesBioinformatics, 2011
- Global characteristics of protein sequences and their implicationsProceedings of the National Academy of Sciences of the United States of America, 2010
- FragBag, an accurate representation of protein structure, retrieves structural neighbors from the entire PDB quickly and accuratelyProceedings of the National Academy of Sciences of the United States of America, 2010
- A minimal sequence code for switching protein structure and functionProceedings of the National Academy of Sciences of the United States of America, 2009
- Sequence physical properties encode the global organization of protein structure spaceProceedings of the National Academy of Sciences of the United States of America, 2009
- Computational Complexity of Multiple Sequence Alignment with SP-ScoreJournal of Computational Biology, 2001
- CATH - a hierarchic classification of protein domain structuresStructure, 1997
- Relation between sequence similarity and structural similarity in proteins. Role of important properties of amino acidsJournal of Protein Chemistry, 1985
- Statistical analysis of the physical properties of the 20 naturally occurring amino acidsProtein Journal, 1985