Multiple Alignment of Promoter Sequences from the Arabidopsis thaliana L. Genome
Open Access
- 21 January 2021
- Vol. 12 (2), 135
- https://doi.org/10.3390/genes12020135
Abstract
In this study, we developed a new mathematical method for performing multiple alignment of highly divergent sequences (MAHDS), i.e., sequences that have on average more than 2.5 substitutions per position (x). We generated sets of artificial DNA sequences with x ranging from 0 to 4.4 and applied MAHDS as well as currently used multiple sequence alignment algorithms, including ClustalW, MAFFT, T-Coffee, Kalign, and Muscle to these sets. The results indicated that most of the existing methods could produce statistically significant alignments only for the sets with x < 2.5, whereas MAHDS could operate on sequences with x = 4.4. We also used MAHDS to analyze a set of promoter sequences from the Arabidopsis thaliana genome and discovered many conserved regions upstream of the transcription initiation site (from −499 to +1 bp); a part of the downstream region (from +1 to +70 bp) also significantly contributed to the obtained alignments. The possibilities of applying the newly developed method for the identification of promoter sequences in any genome are discussed. A server for multiple alignment of nucleotide sequences has been created.Keywords
This publication has 47 references indexed in Scilit:
- A review on multiple sequence alignment from the perspective of genetic algorithmGenomics, 2017
- Multiple sequence alignment modeling: methods and applicationsBriefings in Bioinformatics, 2015
- CUDA ClustalW: An efficient parallel algorithm for progressive multiple sequence alignment on Multi-GPUsComputational Biology and Chemistry, 2015
- Multiple Sequence Alignment Methods. — Edited by David J. Russell.Systematic Biology, 2015
- Computation and Analysis of Genomic Multi-Sequence AlignmentsAnnual Review of Genomics and Human Genetics, 2007
- Settling the Intractability of Multiple AlignmentJournal of Computational Biology, 2006
- Evaluation Measures of Multiple Sequence AlignmentsJournal of Computational Biology, 2000
- On the Complexity of Multiple Sequence AlignmentJournal of Computational Biology, 1994
- [14] Consensus methods for DNA and protein sequence alignmentMethods in Enzymology, 1990
- Simultaneous comparison of three protein sequences.Proceedings of the National Academy of Sciences of the United States of America, 1985