Efficient and Accurate Construction of Genetic Linkage Maps from the Minimum Spanning Tree of a Graph

Abstract
Genetic linkage maps are cornerstones of a wide spectrum of biotechnology applications, including map-assisted breeding, association genetics, and map-assisted gene cloning. During the past several years, the adoption of high-throughput genotyping technologies has been paralleled by a substantial increase in the density and diversity of genetic markers. New genetic mapping algorithms are needed in order to efficiently process these large datasets and accurately construct high-density genetic maps. In this paper, we introduce a novel algorithm to order markers on a genetic linkage map. Our method is based on a simple yet fundamental mathematical property that we prove under rather general assumptions. The validity of this property allows one to determine efficiently the correct order of markers by computing the minimum spanning tree of an associated graph. Our empirical studies obtained on genotyping data for three mapping populations of barley (Hordeum vulgare), as well as extensive simulations on synthetic data, show that our algorithm consistently outperforms the best available methods in the literature, particularly when the input data are noisy or incomplete. The software implementing our algorithm is available in the public domain as a web tool under the name MSTmap. Genetic linkage maps are cornerstones of a wide spectrum of biotechnology applications. In recent years, new high-throughput genotyping technologies have substantially increased the density and diversity of genetic markers, creating new algorithmic challenges for computational biologists. In this paper, we present a novel algorithmic method to construct genetic maps based on a new theoretical insight. Our approach outperforms the best methods available in the scientific literature, particularly when the input data are noisy or incomplete.