Abstract
This paper describes the software PLINK, which is widely used to study large genome-wide data sets for genetic association. The paper is focused on the mathematics and methods of some of the novel features of PLINK. The availability of chips to genotype individuals at tens or hundreds of thousands of single nucleotide polymorphisms (SNPs) across the genome created a new need for tools to analyze these data. The software PLINK is one of the most widely used of such tools. PLINK provides basic methods to manage the large data sets and carry out established statistical tests of association. This paper describes the software and focuses on some of the novel methods included in PLINK. These include methods to cluster individuals by genetic similarity, methods to estimate inbreeding coefficients and identical-by-descent probabilities. The authors also include methods to identify large homozygous regions and regions in which two individuals (not necessarily known to be related) share a long haplotype. PLINK provides a very useful, comprehensive suite of tools to manage and analyze the large sets of data now used to identify genomic variants associated with complex diseases.