Beyond SNP heritability: Polygenicity and discoverability of phenotypes estimated with a univariate Gaussian mixture model

Open Access

19 May 2020

journal article
research article
Published by Public Library of Science (PLoS) in PLoS Genetics

Vol. 16 (5), e1008612
https://doi.org/10.1371/journal.pgen.1008612

Abstract

Estimating the polygenicity (proportion of causally associated single nucleotide polymorphisms (SNPs)) and discoverability (effect size variance) of causal SNPs for human traits is currently of considerable interest. SNP-heritability is proportional to the product of these quantities. We present a basic model, using detailed linkage disequilibrium structure from a reference panel of 11 million SNPs, to estimate these quantities from genome-wide association studies (GWAS) summary statistics. We apply the model to diverse phenotypes and validate the implementation with simulations. We find model polygenicities (as a fraction of the reference panel) ranging from ≃ 2 × 10⁻⁵ to ≃ 4 × 10⁻³, with discoverabilities similarly ranging over two orders of magnitude. A power analysis allows us to estimate the proportions of phenotypic variance explained additively by causal SNPs reaching genome-wide significance at current sample sizes, and map out sample sizes required to explain larger portions of additive SNP heritability. The model also allows for estimating residual inflation (or deflation from over-correcting of z-scores), and assessing compatibility of replication and discovery GWAS summary statistics. There are ∼10 million common variants in the genome of humans with European ancestry. For any particular phenotype a number of these variants will have some causal effect. It is of great interest to be able to quantify the number of these causal variants and the strength of their effect on the phenotype. Genome wide association studies (GWAS) produce very noisy summary statistics for the association between subsets of common variants and phenotypes. For any phenotype, these statistics collectively are difficult to interpret, but buried within them is the true landscape of causal effects. In this work, we posit a probability distribution for the causal effects, and assess its validity using simulations. Using a detailed reference panel of ≃11 million common variants – among which only a small fraction are likely to be causal, but allowing for non-causal variants to show an association with the phenotype due to correlation with causal variants—we implement an exact procedure for estimating the number of causal variants and their mean strength of association with the phenotype. We find that, across different phenotypes, both these quantities—whose product allows for lower bound estimates of heritability—vary by orders of magnitude.

Keywords

Other Versions

Version , 2018-12-17, preprints

Funding Information

Research Council of Norway (262656)
ABCD-USA Consortium (5U24DA041123)

This publication has 92 references indexed in Scilit:

Improved Heritability Estimation from Genome-wide SNPs
American Journal of Human Genetics, 2012
An integrated map of genetic variation from 1,092 human genomes
Nature, 2012
Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis
Nature Genetics, 2012
Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs
Nature Genetics, 2012
Five Years of GWAS Discovery
American Journal of Human Genetics, 2012
Genomic inflation factors under polygenic inheritance
European Journal of Human Genetics, 2011
Prevalence and Correlates of Bipolar Spectrum Disorder in the World Mental Health Survey Initiative
Archives of General Psychiatry, 2011
Estimating Missing Heritability for Disease from Genome-wide Association Studies
American Journal of Human Genetics, 2011
GCTA: A Tool for Genome-wide Complex Trait Analysis
American Journal of Human Genetics, 2010
Biological, clinical and population relevance of 95 loci for blood lipids
Nature, 2010

Cited by 125 articles