Novel 2D maps and coupling numbers for protein sequences. The first QSAR study of polygalacturonases; isolation and prediction of a novel sequence from Psidium guajava L.
- 4 January 2006
- journal article
- Published by Wiley in FEBS Letters
- Vol. 580 (3), 723-730
- https://doi.org/10.1016/j.febslet.2005.12.072
Abstract
The development of 2D graph-theoretic representations for DNA sequences was very important for qualitative and quantitative comparison of sequences. Calculation of numeric features for these representations is useful for DNA-QSAR studies. Most of all graph-theoretic representations identify each one of the four bases with a unitary walk in one axe direction in the 2D space. In the case of proteins, twenty amino acids instead of four bases have to be considered. This fact has limited the introduction of useful 2D Cartesian representations and the corresponding sequences descriptors to encode protein sequence information. In this study, we overcome this problem grouping amino acids into four groups: acid, basic, polar and non-polar amino acids. The identification of each group with one of the four axis directions determines a novel 2D representation and numeric descriptors for proteins sequences. Afterwards, a Markov model has been used to calculate new numeric descriptors of the protein sequence. These descriptors are called herein the sequence 2D coupling numbers (zeta(k)). In this work, we calculated the zeta(k) values for 108 sequences of different polygalacturonases (PGs) and for 100 sequences of other proteins. A Linear Discriminant Analysis model derived here (PG=5.36.zeta1-3.98.zeta3-42.21) successfully discriminates between PGs and other proteins. The model correctly classified 100% of a subset of 81 PGs and 75 non-PG proteins sequences used to train the model. The model also correctly classified 51 out of 52 (98.07%) of proteins sequences used as external validation series. The uses of different group of amino acids and/or axes orientation give different results, so it is suggested to be explored for other databases. Finally, to illustrates the use of the model we report the isolation and prediction of the PG action for a novel sequence (AY908988) isolated by our group from Psidium guajava L. This prediction coincides very well with sequence alignment results found by the BLAST methodology. These findings illustrate the possibilities of the sequence descriptors derived for this novel 2D sequence representation in proteins sequence QSAR studies.Keywords
This publication has 49 references indexed in Scilit:
- Recognition of stable protein mutants with 3D stochastic average electrostatic potentialsFEBS Letters, 2005
- Stochastic molecular descriptors for polymers. 2. Spherical truncation of electrostatic interactions on entropy based polymers 3D-QSARPolymer, 2005
- New Invariant of DNA SequencesJournal of Chemical Information and Modeling, 2004
- Cryptic endotoxic nature ofBacillus thuringiensisCry1Ab insecticidal crystal proteinFEBS Letters, 2004
- Stochastic molecular descriptors for polymers. 1. Modelling the properties of icosahedral viruses with 3D-Markovian negentropiesPolymer, 2004
- Markovian chemicals "in silico" design (MARCH-INSIDE), a promising approach for computer aided molecular design II: experimental and theoretical assessment of a novel method for virtual screening of fasciolicidesJournal of Molecular Modeling, 2002
- Polygalacturonase Gene Expression in Ripe Melon Fruit Supports a Role for Polygalacturonase in Ripening-Associated Pectin DisassemblyPlant Physiology, 1998
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Hydrophobic cluster analysis: An efficient new way to compare and analyse amino acid sequencesFEBS Letters, 1987
- A simple way to look at DNAJournal of Theoretical Biology, 1986