pyGenClean: efficient tool for genetic data clean up before association testing

Open Access

6 May 2013

journal article
research article
Published by Oxford University Press (OUP) in Bioinformatics

Vol. 29 (13), 1704-1705
https://doi.org/10.1093/bioinformatics/btt261

Abstract

Summary: Genetic association studies making use of high-throughput genotyping arrays need to process large amounts of data in the order of millions of markers per experiment. The first step of any analysis with genotyping arrays is typically the conduct of a thorough data clean up and quality control to remove poor quality genotypes and generate metrics to inform and select individuals for downstream statistical analysis. We have developed pyGenClean, a bioinformatics tool to facilitate and standardize the genetic data clean up pipeline with genotyping array data. In conjunction with a source batch-queuing system, the tool minimizes data manipulation errors, accelerates the completion of the data clean up process and provides informative plots and metrics to guide decision making for statistical analysis. Availability and implementation:pyGenClean is an open source Python 2.7 software and is freely available, along with documentation and examples, from http://www.statgen.org. Contact:louis-philippe.lemieux.perreault@umontreal.ca or marie-pierre.dube@statgen.org

This publication has 5 references indexed in Scilit:

Inference of Relationships in Population Data Using Identity-by-Descent and Identity-by-State
PLoS Genetics, 2011
Quality Control Procedures for Genome‐Wide Association Studies
Current Protocols in Human Genetics, 2011
Data quality control in genetic case-control association studies
Nature Protocols, 2010
Quality control and quality assurance in genotypic data for genome‐wide association studies
Genetic Epidemiology, 2010
PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses
American Journal of Human Genetics, 2007

Cited by 15 articles