Identifying discriminative classification-based motifs in biological sequences
Open Access
- 3 March 2011
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 27 (9), 1231-1238
- https://doi.org/10.1093/bioinformatics/btr110
Abstract
Motivation: Identification of conserved motifs in biological sequences is crucial to unveil common shared functions. Many tools exist for motif identification, including some that allow degenerate positions with multiple possible nucleotides or amino acids. Most efficient methods available today search conserved motifs in a set of sequences, but do not check for their specificity regarding to a set of negative sequences. Results: We present a tool to identify degenerate motifs, based on a given classification of amino acids according to their physico-chemical properties. It returns the top K motifs that are most frequent in a positive set of sequences involved in a biological process of interest, and absent from a negative set. Thus, our method discovers discriminative motifs in biological sequences that may be used to identify new sequences involved in the same process. We used this tool to identify candidate effector proteins secreted into plant tissues by the root knot nematode Meloidogyne incognita. Our tool identified a series of motifs specifically present in a positive set of known effectors while totally absent from a negative set of evolutionarily conserved housekeeping proteins. Scanning the proteome of M.incognita, we detected 2579 proteins that contain these specific motifs and can be considered as new putative effectors. Availability and Implementation: The motif discovery tool and the proteins used in the experiments are available at http://dtai.cs.kuleuven.be/ml/systems/merci. Contact:celine.vens@cs.kuleuven.be Supplementary Information: Supplementary data are available at Bioinformatics online.This publication has 26 references indexed in Scilit:
- The value of position-specific priors in motif discovery using MEMEBMC Bioinformatics, 2010
- Direct Identification of the Meloidogyne incognita Secretome Reveals Proteins with Host Cell Reprogramming PotentialPLoS Pathogens, 2008
- Genome sequence of the metazoan plant-parasitic nematode Meloidogyne incognitaNature Biotechnology, 2008
- Transcriptome analysis of root‐knot nematode functions induced in the early stages of parasitism*New Phytologist, 2007
- Locating proteins in the cell using TargetP, SignalP and related toolsNature Protocols, 2007
- Optimal String Mining Under Frequency ConstraintsLecture Notes in Computer Science, 2006
- A Profile of Putative Parasitism Genes Expressed in the Esophageal Gland Cells of the Root-knot Nematode Meloidogyne incognitaMolecular Plant-Microbe Interactions®, 2003
- Mining sequential patternsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Biochemical characterization of MI‐ENG1, a family 5 endoglucanase secreted by the root‐knot nematode Meloidogyne incognitaJBIC Journal of Biological Inorganic Chemistry, 2000
- Molecular cloning and characterisation of a venom allergen AG5-like cDNA from Meloidogyne incognitaInternational Journal for Parasitology, 2000