Graph-based modeling of tandem repeats improves global multiple sequence alignment
Open Access
- 22 July 2013
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 41 (17), e162
- https://doi.org/10.1093/nar/gkt628
Abstract
Tandem repeats (TRs) are often present in proteins with crucial functions, responsible for resistance, pathogenicity and associated with infectious or neurodegenerative diseases. This motivates numerous studies of TRs and their evolution, requiring accurate multiple sequence alignment. TRs may be lost or inserted at any position of a TR region by replication slippage or recombination, but current methods assume fixed unit boundaries, and yet are of high complexity. We present a new global graph-based alignment method that does not restrict TR unit indels by unit boundaries. TR indels are modeled separately and penalized using the phylogeny-aware alignment algorithm. This ensures enhanced accuracy of reconstructed alignments, disentangling TRs and measuring indel events and rates in a biologically meaningful way. Our method detects not only duplication events but also all changes in TR regions owing to recombination, strand slippage and other events inserting or deleting TR units. We evaluate our method by simulation incorporating TR evolution, by either sampling TRs from a profile hidden Markov model or by mimicking strand slippage with duplications. The new method is illustrated on a family of type III effectors, a pathogenicity determinant in agriculturally important bacteria Ralstonia solanacearum. We show that TR indel rate variation contributes to the diversification of this protein family.Keywords
This publication has 40 references indexed in Scilit:
- Repeat or not repeat?—Statistical validation of tandem repeat prediction in genomic sequencesNucleic Acids Research, 2012
- Fast and robust multiple sequence alignment with phylogeny-aware gap placementBMC Bioinformatics, 2012
- Accurate extension of multiple sequence alignments using a phylogeny-aware graph algorithmBioinformatics, 2012
- Functional diversification of the GALA type III effector family contributes to Ralstonia solanacearum adaptation on different plant hostsNew Phytologist, 2011
- Protein tandem repeats - the more perfect, the less structuredThe FEBS Journal, 2010
- Sequence context-specific profiles for homology searchingProceedings of the National Academy of Sciences, 2009
- Multiple non-collinear TF-map alignments of promoter regionsBMC Bioinformatics, 2007
- Multiple alignment of protein sequences with repeats and rearrangementsNucleic Acids Research, 2006
- Ralstonia solanacearum requires F-box-like domain-containing type III effectors to promote disease on several host plantsProceedings of the National Academy of Sciences, 2006
- MUSCLE: multiple sequence alignment with high accuracy and high throughputNucleic Acids Research, 2004