Kalign2: high-performance multiple alignment of protein and nucleotide sequences allowing external features
Open Access
- 31 January 2009
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 37 (3), 858-865
- https://doi.org/10.1093/nar/gkn1006
Abstract
In the growing field of genomics, multiple alignment programs are confronted with ever increasing amounts of data. To address this growing issue we have dramatically improved the running time and memory requirement of Kalign, while maintaining its high alignment accuracy. Kalign version 2 also supports nucleotide alignment, and a newly introduced extension allows for external sequence annotation to be included into the alignment procedure. We demonstrate that Kalign2 is exceptionally fast and memory-efficient, permitting accurate alignment of very large numbers of sequences. The accuracy of Kalign2 compares well to the best methods in the case of protein alignments while its accuracy on nucleotide alignments is generally superior. In addition, we demonstrate the potential of using known or predicted sequence annotation to improve the alignment accuracy. Kalign2 is freely available for download from the Kalign web site (http://msa.sbc.su.se/).This publication has 34 references indexed in Scilit:
- Clustal W and Clustal X version 2.0Bioinformatics, 2007
- PartTree: an algorithm to build an approximate tree from a large number of unaligned sequencesBioinformatics, 2006
- Pfam: clans, web tools and servicesNucleic Acids Research, 2006
- Refining multiple sequence alignments with conserved core regionsNucleic Acids Research, 2006
- MAFFT version 5: improvement in accuracy of multiple sequence alignmentNucleic Acids Research, 2005
- MUSCLE: multiple sequence alignment with high accuracy and high throughputNucleic Acids Research, 2004
- T-coffee: a novel method for fast and accurate multiple sequence alignmentJournal of Molecular Biology, 2000
- Sequence alignment and penalty choice: Review of concepts, case studies and implicationsJournal of Molecular Biology, 1994
- Fast text searchingCommunications of the ACM, 1992
- An improved algorithm for matching biological sequencesJournal of Molecular Biology, 1982