Privacy Preserving GWAS Data Sharing
- 1 December 2011
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 628-635
- https://doi.org/10.1109/icdmw.2011.140
Abstract
Traditional statistical methods for the confidentiality protection for statistical databases do not scale well to deal with GWAS (genome-wide association studies) databases and external information on them. The more recent concept of differential privacy, introduced by the cryptographic community, is an approach which provides a rigorous definition of privacy with meaningful privacy guarantees in the presence of arbitrary external information. Building on such notions, we propose new methods to release aggregate GWAS data without compromising an individual's privacy. We present methods for releasing differentially private minor allele frequencies, chi-square statistics and p-values. We compare these approaches on simulated data and on a GWAS study of canine hair length involving 685 dogs. We also propose a privacy-preserving method for finding genome-wide associations based on a differentially private approach to penalized logistic regression.Keywords
This publication has 11 references indexed in Scilit:
- Sparse Partitioning: Nonlinear regression with binary or tertiary predictors, with application to association studiesThe Annals of Applied Statistics, 2011
- Detecting epistasis via Markov basesJournal of Algebraic Statistics, 2011
- Discovering frequent patterns in sensitive dataPublished by Association for Computing Machinery (ACM) ,2010
- Differential Privacy and the Risk-Utility Tradeoff for Multi-dimensional Contingency TablesLecture Notes in Computer Science, 2010
- Differential Privacy for Clinical Trial Data: Preliminary EvaluationsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2009
- Needles in the Haystack: Identifying Individuals Present in Pooled Genomic DataPLoS Genetics, 2009
- Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping MicroarraysPLoS Genetics, 2008
- Simulating association studies: a data-based resampling method for candidate regions or whole genome scansBioinformatics, 2007
- Penalized logistic regression for detecting gene interactionsBiostatistics, 2007
- Calibrating Noise to Sensitivity in Private Data AnalysisLecture Notes in Computer Science, 2006