Methods for assessing gene content diversity of KIR with examples from a global set of populations

Abstract
A number of statistical methods are widely used to describe allelic variation at specific genetic loci and its implication on the evolutionary history of these loci. Although the methods were developed primarily to study allelic variation at loci that are virtually always present in the genome, they are often applied to data of gene content variation (i.e., presence/absence of multiple homologous genes) at the killer cell immunoglobulin-like receptor (KIR) gene cluster. In this paper, we discuss methodological issues involved in the analysis of gene content variation data in the KIR region and also its covariation with polymorphism at the human leukocyte antigen class I loci, which encode ligands for KIR. A comparison of several statistical methods and measures (gene frequency, haplotype frequency, and linkage disequilibrium estimation) using the Centre d’Etude du Polymorphisme Humain data will be provided using KIR haplotypes that have been determined by segregation analysis, noting the strengths and weaknesses of the methods when only the presence/absence data is considered. Finally, application of these methods to a set of globally distributed populations is described (see Single et al., Nat Genet 39:1114–1119, 2007) in order to illustrate the challenges faced when inferring the joint effects of natural selection and demographic history on these immune-related genes.