Microbiome, Metagenomics, and High-Dimensional Compositional Data Analysis
- 10 April 2015
- journal article
- Published by Annual Reviews in Annual Review of Statistics and Its Application
- Vol. 2 (1), 73-94
- https://doi.org/10.1146/annurev-statistics-010814-020351
Abstract
The human microbiome is the totality of all microbes in and on the human body, and its importance in health and disease has been increasingly recognized. High-throughput sequencing technologies have recently enabled scientists to obtain an unbiased quantification of all microbes constituting the microbiome. Often, a single sample can produce hundreds of millions of short sequencing reads. However, unique characteristics of the data produced by the new technologies, as well as the sheer magnitude of these data, make drawing valid biological inferences from microbiome studies difficult. Analysis of these big data poses great statistical and computational challenges. Important issues include normalization and quantification of relative taxa, bacterial genes, and metabolic abundances; incorporation of phylogenetic information into analysis of metagenomics data; and multivariate analysis of high-dimensional compositional data. We review existing methods, point out their limitations, and outline future research directions.This publication has 50 references indexed in Scilit:
- Kernel Methods for Regression Analysis of Microbiome Compositional DataSpringer Proceedings in Mathematics & Statistics, 2013
- Structure-constrained sparse canonical correlation analysis with an application to microbiome data analysisBiostatistics, 2012
- Associating microbiome composition with environmental covariates using generalized UniFrac distancesBioinformatics, 2012
- A global network of coexisting microbes from environmental and whole-genome sequence dataGenome Research, 2010
- QIIME allows analysis of high-throughput community sequencing dataNature Methods, 2010
- The GAAS Metagenomic Tool and Its Estimations of Viral and Microbial Average Genome Size in Four Major BiomesPLoS Computational Biology, 2009
- Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov modelsNature Methods, 2009
- Host-Bacterial Mutualism in the Human IntestineScience, 2005
- Statistical Interpretation of Species CompositionJournal of the American Statistical Association, 2001
- Log contrast models for experiments with mixturesBiometrika, 1984