MB-GAN: Microbiome Simulation via Generative Adversarial Network

Open Access

29 January 2021

journal article
research article
Published by Oxford University Press (OUP) in GigaScience

Vol. 10 (2)
https://doi.org/10.1093/gigascience/giab005

Abstract

Trillions of microbes inhabit the human body and have a profound effect on human health. The recent development of metagenome-wide association studies and other quantitative analysis methods accelerate the discovery of the associations between human microbiome and diseases. To assess the strengths and limitations of these analytical tools, simulating realistic microbiome datasets is critically important. However, simulating the real microbiome data is challenging because it is difficult to model their correlation structure using explicit statistical models. To address the challenge of simulating realistic microbiome data, we designed a novel simulation framework termed MB-GAN, by using a generative adversarial network (GAN) and utilizing methodology advancements from the deep learning community. MB-GAN can automatically learn from given microbial abundances and compute simulated abundances that are indistinguishable from them. In practice, MB-GAN showed the following advantages. First, MB-GAN avoids explicit statistical modeling assumptions, and it only requires real datasets as inputs. Second, unlike the traditional GANs, MB-GAN is easily applicable and can converge efficiently. By applying MB-GAN to a case-control gut microbiome study of 396 samples, we demonstrated that the simulated data and the original data had similar first-order and second-order properties, including sparsity, diversities, and taxa-taxa correlations. These advantages are suitable for further microbiome methodology development where high-fidelity microbiome data are needed.

Keywords

Funding Information

National Institutes of Health (5P30CA142543, 5R01GM126479, 5R01HG008983, 1R56HG011035)

This publication has 24 references indexed in Scilit:

Testing in Microbiome-Profiling Studies with MiRKAT, the Microbiome Regression-Based Kernel Association Test
American Journal of Human Genetics, 2015
Microbiome, Metagenomics, and High-Dimensional Compositional Data Analysis
Annual Review of Statistics and Its Application, 2015
Proportionality: A Valid Alternative to Correlation for Relative Data
PLoS Computational Biology, 2015
Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes
Nature Biotechnology, 2014
A metagenome-wide association study of gut microbiota in type 2 diabetes
Nature, 2012
Microbial Co-occurrence Relationships in the Human Microbiome
PLoS Computational Biology, 2012
The Phylogenetic Kantorovich–Rubinstein Metric for Environmental Sequence Samples
Journal of the Royal Statistical Society Series B: Statistical Methodology, 2012
Genotype and SNP calling from next-generation sequencing data
Nature Reviews Genetics, 2011
edgeR: a Bioconductor package for differential expression analysis of digital gene expression data
Bioinformatics, 2009
UniFrac: a New Phylogenetic Method for Comparing Microbial Communities
Applied and Environmental Microbiology, 2005

Cited by 16 articles