fcGENE: A Versatile Tool for Processing and Transforming SNP Datasets
Open Access
- 22 July 2014
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLOS ONE
- Vol. 9 (7), e97589
- https://doi.org/10.1371/journal.pone.0097589
Abstract
Modern analysis of high-dimensional SNP data requires a number of biometrical and statistical methods such as pre-processing, analysis of population structure, association analysis and genotype imputation. Software used for these purposes often rely on specific and incompatible input and output data formats. Therefore extensive data management including multiple format conversions is necessary during analyses. In order to support fast and efficient management and bio-statistical quality control of high-dimensional SNP data, we developed the publically available software fcGENE using C++ object-oriented programming language. This software simplifies and automates the use of different existing analysis packages, especially during the workflow of genotype imputations and corresponding analyses. fcGENE transforms SNP data and imputation results into different formats required for a large variety of analysis packages such as PLINK, SNPTEST, HAPLOVIEW, EIGENSOFT, GenABEL and tools used for genotype imputation such as MaCH, IMPUTE, BEAGLE and others. Data Management tasks like merging, splitting, extracting SNP and pedigree information can be performed. fcGENE also supports a number of bio-statistical quality control processes and quality based filtering processes at SNP- and sample-wise level. The tool also generates templates of commands required to run specific software packages, especially those required for genotype imputation. We demonstrate the functionality of fcGENE by example workflows of SNP data analyses and provide a comprehensive manual of commands, options and applications. We have developed a user-friendly open-source software fcGENE, which comprehensively supports SNP data management, quality control and analysis workflows. Download statistics and corresponding feedbacks indicate that software is highly recognised and extensively applied by the scientific community.Keywords
This publication has 17 references indexed in Scilit:
- The variant call format and VCFtoolsBioinformatics, 2011
- A comparison of approaches to account for uncertainty in analysis of imputed genotypesGenetic Epidemiology, 2011
- Practical aspects of imputation-driven meta-analysis of genome-wide association studiesHuman Molecular Genetics, 2008
- Pathway-Based Approaches for Analysis of Genomewide Association StudiesAmerican Journal of Human Genetics, 2007
- Rapid and Accurate Haplotype Phasing and Missing-Data Inference for Whole-Genome Association Studies By Use of Localized Haplotype ClusteringAmerican Journal of Human Genetics, 2007
- PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage AnalysesAmerican Journal of Human Genetics, 2007
- A new multipoint method for genome-wide association studies by imputation of genotypesNature Genetics, 2007
- Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controlsNature, 2007
- A Fast and Flexible Statistical Model for Large-Scale Population Genotype Data: Applications to Inferring Missing Genotypes and Haplotypic PhaseAmerican Journal of Human Genetics, 2006
- Haploview: analysis and visualization of LD and haplotype mapsBioinformatics, 2004