Improved similarity scores for comparing motifs
Open Access
- 4 May 2011
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 27 (12), 1603-1609
- https://doi.org/10.1093/bioinformatics/btr257
Abstract
Motivation: A question that often comes up after applying a motif finder to a set of co-regulated DNA sequences is whether the reported putative motif is similar to any known motif. While several tools have been designed for this task, Habib et al. pointed out that the scores that are commonly used for measuring similarity between motifs do not distinguish between a good alignment of two informative columns (say, all-A) and one of two uninformative columns. This observation explains why tools such as Tomtom occasionally return an alignment of uninformative columns which is clearly spurious. To address this problem, Habib et al. suggested a new score [Bayesian Likelihood 2-Component (BLiC)] which uses a Bayesian information criterion to penalize matches that are also similar to the background distribution. Results: We show that the BLiC score exhibits other, highly undesirable properties, and we offer instead a general approach to adjust any motif similarity score so as to reduce the number of reported spurious alignments of uninformative columns. We implement our method in Tomtom and show that, without significantly compromising Tomtom's retrieval accuracy or its runtime, we can drastically reduce the number of uninformative alignments. Availability and Implementation: The modified Tomtom is available as part of the MEME Suite at http://meme.nbcr.net. Contact:uri@maths.usyd.edu.au; e.tanaka@maths.usyd.edu.au Supplementary Information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 16 references indexed in Scilit:
- Metamotifs - a generative model for building families of nucleotide position weight matricesBMC Bioinformatics, 2010
- JASPAR 2010: the greatly expanded open-access database of transcription factor binding profilesNucleic Acids Research, 2009
- UniPROBE: an online database of protein binding microarray data on protein-DNA interactionsNucleic Acids Research, 2009
- A Novel Bayesian DNA Motif Comparison Method for Clustering and RetrievalPLoS Computational Biology, 2008
- A survey of DNA motif finding algorithmsBMC Bioinformatics, 2007
- STAMP: a web tool for exploring DNA-binding motif similaritiesNucleic Acids Research, 2007
- Quantifying similarity between motifsGenome Biology, 2007
- An improved map of conserved regulatory sites for Saccharomyces cerevisiaeBMC Bioinformatics, 2006
- WebLogo: A Sequence Logo Generator: Figure 1Genome Research, 2004
- Biological Sequence AnalysisPublished by Cambridge University Press (CUP) ,1998