Visualization of SNPs with t-SNE

Open Access

15 February 2013

journal article
research article
Published by Public Library of Science (PLoS) in PLOS ONE

Vol. 8 (2), e56883
https://doi.org/10.1371/journal.pone.0056883

Abstract

Single Nucleotide Polymorphisms (SNPs) are one of the largest sources of new data in biology. In most papers, SNPs between individuals are visualized with Principal Component Analysis (PCA), an older method for this purpose. We compare PCA, an aging method for this purpose, with a newer method, t-Distributed Stochastic Neighbor Embedding (t-SNE) for the visualization of large SNP datasets. We also propose a set of key figures for evaluating these visualizations; in all of these t-SNE performs better. To transform data PCA remains a reasonably good method, but for visualization it should be replaced by a method from the subfield of dimension reduction. To evaluate the performance of visualization, we propose key figures of cross-validation with machine learning methods, as well as indices of cluster validity.

Keywords

This publication has 29 references indexed in Scilit:

Seasonal variations of biochemical, pigment, fatty acid, and sterol compositions in female Crassostrea corteziensis oysters in relation to the reproductive cycle
Comparative Biochemistry and Physiology Part B: Biochemistry and Molecular Biology, 2012
Coordinating Environmental Genomics and Geochemistry Reveals Metabolic Transitions in a Hot Spring Ecosystem
PLOS ONE, 2012
Population structure and linkage disequilibrium in elite barley breeding germplasm from the United States
Journal of Zhejiang University-SCIENCE B, 2012
Hypoalbuminaemia, systemic albumin leak and endothelial dysfunction in peritoneal dialysis patients
Nephrology Dialysis Transplantation, 2012
Batch effect correction for genome-wide methylation data with Illumina Infinium platform
BMC Medical Genomics, 2011
Selenium and 17 other largely essential and toxic metals in muscle and organ meats of Red Deer (Cervus elaphus) — Consequences to human health
Environment International, 2011
MPCA: Multilinear Principal Component Analysis of Tensor Objects
IEEE Transactions on Neural Networks, 2008
A Nonlinear Mapping for Data Structure Analysis
IEEE Transactions on Computers, 1969
Adaptive Control Processes
Published by Walter de Gruyter GmbH ,1961
LIII. On lines and planes of closest fit to systems of points in space
The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 1901

Cited by 80 articles