Multiple sequence alignments of partially coding nucleic acid sequences

Open Access

28 June 2005

journal article
software
Published by Springer Science and Business Media LLC in BMC Bioinformatics

Vol. 6 (1), 160
https://doi.org/10.1186/1471-2105-6-160

Abstract

High quality sequence alignments of RNA and DNA sequences are an important prerequisite for the comparative analysis of genomic sequence data. Nucleic acid sequences, however, exhibit a much larger sequence heterogeneity compared to their encoded protein sequences due to the redundancy of the genetic code. It is desirable, therefore, to make use of the amino acid sequence when aligning coding nucleic acid sequences. In many cases, however, only a part of the sequence of interest is translated. On the other hand, overlapping reading frames may encode multiple alternative proteins, possibly with intermittent non-coding parts. Examples are, in particular, RNA virus genomes. The standard scoring scheme for nucleic acid alignments can be extended to incorporate simultaneously information on translation products in one or more reading frames. Here we present a multiple alignment tool, codaln, that implements a combined nucleic acid plus amino acid scoring model for pairwise and progressive multiple alignments that allows arbitrary weighting for almost all scoring parameters. Resource requirements of codaln are comparable with those of standard tools such as ClustalW. We demonstrate the applicability of codaln to various biologically relevant types of sequences (bacteriophage Levivirus and Vertebrate Hox clusters) and show that the combination of nucleic acid and amino acid sequence information leads to improved alignments. These, in turn, increase the performance of analysis tools that depend strictly on good input alignments such as methods for detecting conserved RNA secondary structure elements.

Keywords

This publication has 38 references indexed in Scilit:

Properties of overlapping genes are conserved across microbial genomes
Genome Research, 2004
Conserved RNA secondary structures in viral genomes: a survey
Bioinformatics, 2004
Influenza virus still surprises
Current Opinion in Microbiology, 2002
Erratum to ‘Expression of alternatively spliced FGF-2 antisense RNA transcripts in the central nervous system: regulation of FGF-2 mRNA translation’: [Mol. Cell. Endocrinol. 162 (2000) 69–78]
Molecular and Cellular Endocrinology, 2000
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice
Nucleic Acids Research, 1994
Archetypal organization of the amphioxus Hox gene cluster
Nature, 1994
An Algorithm Combining DNA and Protein Alignment
Journal of Theoretical Biology, 1994
Sequence analysis of RNA species synthesized by Q.beta. replicase without template
Biochemistry, 1993
Conserved elements in the 3′ untranslated region of flavivirus RNAs and potential cyclization sequences
Journal of Molecular Biology, 1987
An improved algorithm for matching biological sequences
Journal of Molecular Biology, 1982

Cited by 24 articles