Coupled mixed model for joint genetic analysis of complex disorders with two independently collected data sets

Open Access

5 February 2021

journal article
research article
Published by Springer Science and Business Media LLC in BMC Bioinformatics

Vol. 22 (1), 1-14
https://doi.org/10.1186/s12859-021-03959-2

Abstract

Background: In the last decade, Genome-wide Association studies (GWASs) have contributed to decoding the human genome by uncovering many genetic variations associated with various diseases. Many follow-up investigations involve joint analysis of multiple independently generated GWAS data sets. While most of the computational approaches developed for joint analysis are based on summary statistics, the joint analysis based on individual-level data with consideration of confounding factors remains to be a challenge. Results: In this study, we propose a method, called Coupled Mixed Model (CMM), that enables a joint GWAS analysis on two independently collected sets of GWAS data with different phenotypes. The CMM method does not require the data sets to have the same phenotypes as it aims to infer the unknown phenotypes using a set of multivariate sparse mixed models. Moreover, CMM addresses the confounding variables due to population stratification, family structures, and cryptic relatedness, as well as those arising during data collection such as batch effects that frequently appear in joint genetic studies. We evaluate the performance of CMM using simulation experiments. In real data analysis, we illustrate the utility of CMM by an application to evaluating common genetic associations for Alzheimer’s disease and substance use disorder using datasets independently collected for the two complex human disorders. Comparison of the results with those from previous experiments and analyses supports the utility of our method and provides new insights into the diseases. The software is available at https://github.com/HaohanWang/CMM.

Keywords

Funding Information

National Institutes of Health (R01-GM093156)
National Institutes of Health (P30-DA035778)

This publication has 62 references indexed in Scilit:

FaST-LMM-Select for addressing confounding from spatial structure and rare variants
Nature Genetics, 2013
Polygenic Modeling with Bayesian Sparse Linear Mixed Models
PLoS Genetics, 2013
Mixed linear model approach adapted for genome-wide association studies
Nature Genetics, 2010
Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers
Foundations and Trends® in Machine Learning, 2010
Genome-wide association analysis by lasso penalized logistic regression
Bioinformatics, 2009
Genome-wide Association Analysis Reveals Putative Alzheimer's Disease Susceptibility Loci in Addition to APOE
American Journal of Human Genetics, 2008
TRPV1 Channels Mediate Long-Term Depression at Synapses on Hippocampal Interneurons
Neuron, 2008
Synaptic plasticity and addiction
Nature Reviews Neuroscience, 2007
The Adaptive Lasso and Its Oracle Properties
Journal of the American Statistical Association, 2006
simuPOP: a forward-time population genetics simulation environment
Bioinformatics, 2005

Cited by 2 articles