Random Forests for Genetic Association Studies

31 December 2010

journal article
research article
Published by Walter de Gruyter GmbH in Statistical Applications in Genetics and Molecular Biology

Vol. 10 (1), 32
https://doi.org/10.2202/1544-6115.1691

Abstract

The Random Forests (RF) algorithm has become a commonly used machine learning algorithm for genetic association studies. It is well suited for genetic applications since it is both computationally efficient and models genetic causal mechanisms well. With its growing ubiquity, there has been inconsistent and less than optimal use of RF in the literature. The purpose of this review is to breakdown the theoretical and statistical basis of RF so that practitioners are able to apply it in their work. An emphasis is placed on showing how the various components contribute to bias and variance, as well as discussing variable importance measures. Applications specific to genetic studies are highlighted. To provide context, RF is compared to other commonly used machine learning algorithms.

This publication has 44 references indexed in Scilit:

A map of human genome variation from population-scale sequencing
Nature, 2010
A screening methodology based on Random Forests to improve the detection of gene–gene interactions
European Journal of Human Genetics, 2010
Prioritizing GWAS Results: A Review of Statistical Methods and Recommendations for Their Application
American Journal of Human Genetics, 2010
Multigenic Modeling of Complex Disease by Random Forests
Published by Elsevier BV ,2010
Integration of genetic risk factors into a clinical algorithm for multiple sclerosis susceptibility: a weighted genetic risk score
The Lancet Neurology, 2009
Detecting gene–gene interactions that underlie human diseases
Nature Reviews Genetics, 2009
A second generation human haplotype map of over 3.1 million SNPs
Nature, 2007
Principal components analysis corrects for stratification in genome-wide association studies
Nature Genetics, 2006
Identifying SNPs predictive of phenotype using random forests
Genetic Epidemiology, 2004
The International HapMap Project
Nature, 2003

Cited by 210 articles