The Gibbs and split–merge sampler for population mixture analysis from genetic data with incomplete baselines

1 March 2006

journal article
Published by Canadian Science Publishing in Canadian Journal of Fisheries and Aquatic Sciences

Vol. 63 (3), 576-596
https://doi.org/10.1139/f05-224

Abstract

Although population mixtures often include contributions from novel populations as well as from baseline populations previously sampled, unlabeled mixture individuals can be separated to their sources from genetic data. A Gibbs and split–merge Markov chain Monte Carlo sampler is described for successively partitioning a genetic mixture sample into plausible subsets of individuals from each of the baseline and extra-baseline populations present. The subsets are selected to satisfy the Hardy–Weinberg and linkage equilibrium conditions expected for large, panmictic populations. The number of populations present can be inferred from the distribution for counts of subsets per partition drawn by the sampler. To further summarize the sampler's output, co-assignment probabilities of mixture individuals to the same subsets are computed from the partitions and are used to construct a binary tree of their relatedness. The tree graphically displays the clusters of mixture individuals together with a quantitative measure of the evidence supporting their various separate and common sources. The methodology is applied to several simulated and real data sets to illustrate its use and demonstrate the sampler's superior performance.

Keywords

GENETICS

This publication has 20 references indexed in Scilit:

A Split-Merge Markov chain Monte Carlo Procedure for the Dirichlet Process Mixture Model
Journal of Computational and Graphical Statistics, 2004
Inference of Population Structure Using Multilocus Genotype Data: Linked Loci and Correlated Allele Frequencies
Genetics, 2003
A Bayesian approach to the identification of panmictic populations and the assignment of individuals
Genetics Research, 2001
Computer note. SPAM (version 3.2): statistics program for analyzing mixtures
Journal of Heredity, 2000
Computational and Inferential Difficulties with Mixture Posterior Distributions
Journal of the American Statistical Association, 2000
Application of microsatellite DNA variation to estimation of stock composition and escapement of Nass River sockeye salmon (Oncorhynchus nerka)
Canadian Journal of Fisheries and Aquatic Sciences, 1999
Inference in model-based cluster analysis
Statistics and Computing, 1997
Estimating Stock Composition in Mixed Stock Fisheries Using Morphometric, Meristic, and Electrophoretic Characteristics
Canadian Journal of Fisheries and Aquatic Sciences, 1984
Ferguson Distributions Via Polya Urn Schemes
The Annals of Statistics, 1973
The sampling theory of selectively neutral alleles
Theoretical Population Biology, 1972

Cited by 70 articles