Secure genome-wide association analysis using multiparty computation
- 7 May 2018
- journal article
- research article
- Published by Springer Science and Business Media LLC in Nature Biotechnology
- Vol. 36 (6), 547-551
- https://doi.org/10.1038/nbt.4108
Abstract
A computational protocol built upon modern cryptographic techniques enables secure analysis of large-scale genetic data. Most sequenced genomes are currently stored in strict access-controlled repositories1,2,3. Free access to these data could improve the power of genome-wide association studies (GWAS) to identify disease-causing genetic variants and aid the discovery of new drug targets4,5. However, concerns over genetic data privacy6,7,8,9 may deter individuals from contributing their genomes to scientific studies10 and could prevent researchers from sharing data with the scientific community11. Although cryptographic techniques for secure data analysis exist12,13,14, none scales to computationally intensive analyses, such as GWAS. Here we describe a protocol for large-scale genome-wide analysis that facilitates quality control and population stratification correction in 9K, 13K, and 23K individuals while maintaining the confidentiality of underlying genotypes and phenotypes. We show the protocol could feasibly scale to a million individuals. This approach may help to make currently restricted data available to the scientific community and could potentially enable secure genome crowdsourcing, allowing individuals to contribute their genomes to a study without compromising their privacy.Keywords
This publication has 37 references indexed in Scilit:
- Association of Granulomatosis With Polyangiitis (Wegener's) With HLA–DPB1*04 and SEMA6A Gene Variants: Evidence From Genome‐Wide AnalysisArthritis & Rheumatism, 2013
- A new way to protect privacy in large-scale genome-wide association studiesBioinformatics, 2013
- Identifying Personal Genomes by Surname InferenceScience, 2013
- Genome-wide association analysis identifies new lung cancer susceptibility loci in never-smoking women in AsiaNature Genetics, 2012
- China Kadoorie Biobank of 0.5 million people: survey methods, baseline characteristics and long-term follow-upInternational Journal of Epidemiology, 2011
- Required sample size and nonreplicability thresholds for heterogeneous genetic associationsProceedings of the National Academy of Sciences of the United States of America, 2008
- Implications of Small Effect Sizes of Individual Genetic Variants on the Design and Interpretation of Genetic Association Studies of Complex DiseasesAmerican Journal of Epidemiology, 2006
- Principal components analysis corrects for stratification in genome-wide association studiesNature Genetics, 2006
- Urinary bladder cancer in Wegener's granulomatosis: risks and relation to cyclophosphamideAnnals Of The Rheumatic Diseases, 2004
- Assessing the impact of population stratification on genetic association studiesNature Genetics, 2004