Random Forests for Genetic Association Studies
- 31 December 2010
- journal article
- research article
- Published by Walter de Gruyter GmbH in Statistical Applications in Genetics and Molecular Biology
- Vol. 10 (1), 32
- https://doi.org/10.2202/1544-6115.1691
Abstract
The Random Forests (RF) algorithm has become a commonly used machine learning algorithm for genetic association studies. It is well suited for genetic applications since it is both computationally efficient and models genetic causal mechanisms well. With its growing ubiquity, there has been inconsistent and less than optimal use of RF in the literature. The purpose of this review is to breakdown the theoretical and statistical basis of RF so that practitioners are able to apply it in their work. An emphasis is placed on showing how the various components contribute to bias and variance, as well as discussing variable importance measures. Applications specific to genetic studies are highlighted. To provide context, RF is compared to other commonly used machine learning algorithms.This publication has 44 references indexed in Scilit:
- A map of human genome variation from population-scale sequencingNature, 2010
- A screening methodology based on Random Forests to improve the detection of gene–gene interactionsEuropean Journal of Human Genetics, 2010
- Prioritizing GWAS Results: A Review of Statistical Methods and Recommendations for Their ApplicationAmerican Journal of Human Genetics, 2010
- Multigenic Modeling of Complex Disease by Random ForestsPublished by Elsevier BV ,2010
- Integration of genetic risk factors into a clinical algorithm for multiple sclerosis susceptibility: a weighted genetic risk scoreThe Lancet Neurology, 2009
- Detecting gene–gene interactions that underlie human diseasesNature Reviews Genetics, 2009
- A second generation human haplotype map of over 3.1 million SNPsNature, 2007
- Principal components analysis corrects for stratification in genome-wide association studiesNature Genetics, 2006
- Identifying SNPs predictive of phenotype using random forestsGenetic Epidemiology, 2004
- The International HapMap ProjectNature, 2003