Supervised learning with decision tree-based methods in computational and systems biology

31 December 2008

journal article
review article
Published by Royal Society of Chemistry (RSC) in Molecular BioSystems

Vol. 5 (12), 1593-1605
https://doi.org/10.1039/b907946g

Abstract

At the intersection between artificial intelligence and statistics, supervised learning allows algorithms to automatically build predictive models from just observations of a system. During the last twenty years, supervised learning has been a tool of choice to analyze the always increasing and complexifying data generated in the context of molecular biology, with successful applications in genome annotation, function prediction, or biomarker discovery. Among supervised learning methods, decision tree-based methods stand out as non parametric methods that have the unique feature of combining interpretability, efficiency, and, when used in ensembles of trees, excellent accuracy. The goal of this paper is to provide an accessible and comprehensive introduction to this class of methods. The first part of the review is devoted to an intuitive but complete description of decision tree-based methods and a discussion of their strengths and limitations with respect to other supervised learning methods. The second part of the review provides a survey of their applications in the context of computational and systems biology.

This publication has 79 references indexed in Scilit:

Prediction of DNA-binding residues in proteins from amino acid sequences using a random forest model with a hybrid feature
Bioinformatics, 2008
Predicting small ligand binding sites in proteins using backbone structure
Bioinformatics, 2008
What are decision trees?
Nature Biotechnology, 2008
MetalDetector: a web server for predicting metal-binding sites and disulfide bridges in proteins from sequence
Bioinformatics, 2008
Bioimage informatics: a new area of engineering biology
Bioinformatics, 2008
EM-random forest and new measures of variable importance for multi-locus quantitative trait linkage analysis
Bioinformatics, 2008
Systematic discovery and characterization of fly microRNAs using 12 Drosophila genomes
Genome Research, 2007
Global landscape of protein complexes in the yeast Saccharomyces cerevisiae
Nature, 2006
Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data
Nature Genetics, 2003
Tissue Classification with Gene Expression Profiles
Journal of Computational Biology, 2000

Cited by 161 articles