Improved spliced alignment from an information theoretic approach

Open Access

2 November 2005

journal article
research article
Published by Oxford University Press (OUP) in Bioinformatics

Vol. 22 (1), 13-20
https://doi.org/10.1093/bioinformatics/bti748

Abstract

Motivation: mRNA sequences and expressed sequence tags represent some of the most abundant experimental data for identifying genes and alternatively spliced products in metazoans. These transcript sequences are frequently studied by aligning them to a genomic sequence template. For existing programs, error-prone, polymorphic and cross-species data, as well as non-canonical splice sites, still present significant barriers to producing accurate, complete alignments. Results: We took a novel approach to spliced alignment that meaningfully combined information from sequence similarity with that obtained from PSSM splice site models. Scoring systems were chosen to maximize their power of discrimination, and dynamic programming (DP) was employed to guarantee optimal solutions would be found. The resultant program, EXALIN, performed better than other popular tools tested under a wide range of conditions that included detection of micro-exons and human–mouse cross-species comparisons. For improved speed with only a marginal decrease in splice site prediction accuracy, EXALIN could perform limited DP guided by a result from BLASTN. Availability: The source code, binaries, scripts, scoring matrices and splice site models for human, mouse, rice and Caenorhabditis elegans utilized in this study are posted at . The software (scripts, source code and binaries) is copyrighted but free for all to use. Contact:gish@blast.wustl.edu Supplementary information:

Keywords

This publication has 34 references indexed in Scilit:

Gene and alternative splicing annotation with AIR
Genome Research, 2005
Gene Structure Prediction and Alternative Splicing Analysis Using Genomically Aligned ESTs
Genome Research, 2001
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Research, 1997
Amino acid substitution matrices from an information theoretic perspective
Journal of Molecular Biology, 1991
Basic Local Alignment Search Tool
Journal of Molecular Biology, 1990
Basic local alignment search tool
Journal of Molecular Biology, 1990
Computer Methods for Analyzing Sequence Recognition of Nucleic Acids
Annual Review of Biophysics and Biophysical Chemistry, 1988
Selection of DNA binding sites by regulatory proteins: Statistical-mechanical theory and application to operators and promoters
Journal of Molecular Biology, 1987
Identification of common molecular subsequences
Journal of Molecular Biology, 1981
A general method applicable to the search for similarities in the amino acid sequence of two proteins
Journal of Molecular Biology, 1970

Cited by 25 articles