A New Census of Protein Tandem Repeats and Their Relationship with Intrinsic Disorder
Open Access
- 9 April 2020
- Vol. 11 (4), 407
- https://doi.org/10.3390/genes11040407
Abstract
Protein tandem repeats (TRs) are often associated with immunity-related functions and diseases. Since that last census of protein TRs in 1999, the number of curated proteins increased more than seven-fold and new TR prediction methods were published. TRs appear to be enriched with intrinsic disorder and vice versa. The significance and the biological reasons for this association are unknown. Here, we characterize protein TRs across all kingdoms of life and their overlap with intrinsic disorder in unprecedented detail. Using state-of-the-art prediction methods, we estimate that 50.9% of proteins contain at least one TR, often located at the sequence flanks. Positive linear correlation between the proportion of TRs and the protein length was observed universally, with Eukaryotes in general having more TRs, but when the difference in length is taken into account the difference is quite small. TRs were enriched with disorder-promoting amino acids and were inside intrinsically disordered regions. Many such TRs were homorepeats. Our results support that TRs mostly originate by duplication and are involved in essential functions such as transcription processes, structural organization, electron transport and iron-binding. In viruses, TRs are found in proteins essential for virulence.Keywords
Funding Information
- European Cooperation in Science and Technology (IZCNZ0-174836)
This publication has 90 references indexed in Scilit:
- Graph-based modeling of tandem repeats improves global multiple sequence alignmentNucleic Acids Research, 2013
- Background-dependent effects of polyglutamine variation in the Arabidopsis thaliana gene ELF3Proceedings of the National Academy of Sciences of the United States of America, 2012
- Repeat or not repeat?—Statistical validation of tandem repeat prediction in genomic sequencesNucleic Acids Research, 2012
- Evolution and function of CAG/polyglutamine repeats in protein–protein interaction networksNucleic Acids Research, 2012
- ESpritz: accurate and fast prediction of protein disorderBioinformatics, 2011
- Java bioinformatics analysis web services for multiple sequence alignment—JABAWS:MSABioinformatics, 2011
- Conservation of Intrinsic Disorder in Protein Domains and Families: II. Functions of Conserved DisorderJournal of Proteome Research, 2006
- Exploiting heterogeneous sequence properties improves prediction of protein disorderProteins-Structure Function and Bioinformatics, 2005
- Microsatellites: simple sequences with complex evolutionNature Reviews Genetics, 2004
- Intrinsically unstructured proteins evolve by repeat expansionBioEssays, 2003