Microbiome, Metagenomics, and High-Dimensional Compositional Data Analysis

10 April 2015

journal article
Published by Annual Reviews in Annual Review of Statistics and Its Application

Vol. 2 (1), 73-94
https://doi.org/10.1146/annurev-statistics-010814-020351

Abstract

The human microbiome is the totality of all microbes in and on the human body, and its importance in health and disease has been increasingly recognized. High-throughput sequencing technologies have recently enabled scientists to obtain an unbiased quantification of all microbes constituting the microbiome. Often, a single sample can produce hundreds of millions of short sequencing reads. However, unique characteristics of the data produced by the new technologies, as well as the sheer magnitude of these data, make drawing valid biological inferences from microbiome studies difficult. Analysis of these big data poses great statistical and computational challenges. Important issues include normalization and quantification of relative taxa, bacterial genes, and metabolic abundances; incorporation of phylogenetic information into analysis of metagenomics data; and multivariate analysis of high-dimensional compositional data. We review existing methods, point out their limitations, and outline future research directions.

This publication has 50 references indexed in Scilit:

Kernel Methods for Regression Analysis of Microbiome Compositional Data
Springer Proceedings in Mathematics & Statistics, 2013
Structure-constrained sparse canonical correlation analysis with an application to microbiome data analysis
Biostatistics, 2012
Associating microbiome composition with environmental covariates using generalized UniFrac distances
Bioinformatics, 2012
A global network of coexisting microbes from environmental and whole-genome sequence data
Genome Research, 2010
QIIME allows analysis of high-throughput community sequencing data
Nature Methods, 2010
The GAAS Metagenomic Tool and Its Estimations of Viral and Microbial Average Genome Size in Four Major Biomes
PLoS Computational Biology, 2009
Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models
Nature Methods, 2009
Host-Bacterial Mutualism in the Human Intestine
Science, 2005
Statistical Interpretation of Species Composition
Journal of the American Statistical Association, 2001
Log contrast models for experiments with mixtures
Biometrika, 1984

Cited by 235 articles