Supervised learning with decision tree-based methods in computational and systems biology
- 31 December 2008
- journal article
- review article
- Published by Royal Society of Chemistry (RSC) in Molecular BioSystems
- Vol. 5 (12), 1593-1605
- https://doi.org/10.1039/b907946g
Abstract
At the intersection between artificial intelligence and statistics, supervised learning allows algorithms to automatically build predictive models from just observations of a system. During the last twenty years, supervised learning has been a tool of choice to analyze the always increasing and complexifying data generated in the context of molecular biology, with successful applications in genome annotation, function prediction, or biomarker discovery. Among supervised learning methods, decision tree-based methods stand out as non parametric methods that have the unique feature of combining interpretability, efficiency, and, when used in ensembles of trees, excellent accuracy. The goal of this paper is to provide an accessible and comprehensive introduction to this class of methods. The first part of the review is devoted to an intuitive but complete description of decision tree-based methods and a discussion of their strengths and limitations with respect to other supervised learning methods. The second part of the review provides a survey of their applications in the context of computational and systems biology.This publication has 79 references indexed in Scilit:
- Prediction of DNA-binding residues in proteins from amino acid sequences using a random forest model with a hybrid featureBioinformatics, 2008
- Predicting small ligand binding sites in proteins using backbone structureBioinformatics, 2008
- What are decision trees?Nature Biotechnology, 2008
- MetalDetector: a web server for predicting metal-binding sites and disulfide bridges in proteins from sequenceBioinformatics, 2008
- Bioimage informatics: a new area of engineering biologyBioinformatics, 2008
- EM-random forest and new measures of variable importance for multi-locus quantitative trait linkage analysisBioinformatics, 2008
- Systematic discovery and characterization of fly microRNAs using 12 Drosophila genomesGenome Research, 2007
- Global landscape of protein complexes in the yeast Saccharomyces cerevisiaeNature, 2006
- Module networks: identifying regulatory modules and their condition-specific regulators from gene expression dataNature Genetics, 2003
- Tissue Classification with Gene Expression ProfilesJournal of Computational Biology, 2000