Visualization of SNPs with t-SNE
Open Access
- 15 February 2013
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLOS ONE
- Vol. 8 (2), e56883
- https://doi.org/10.1371/journal.pone.0056883
Abstract
Single Nucleotide Polymorphisms (SNPs) are one of the largest sources of new data in biology. In most papers, SNPs between individuals are visualized with Principal Component Analysis (PCA), an older method for this purpose. We compare PCA, an aging method for this purpose, with a newer method, t-Distributed Stochastic Neighbor Embedding (t-SNE) for the visualization of large SNP datasets. We also propose a set of key figures for evaluating these visualizations; in all of these t-SNE performs better. To transform data PCA remains a reasonably good method, but for visualization it should be replaced by a method from the subfield of dimension reduction. To evaluate the performance of visualization, we propose key figures of cross-validation with machine learning methods, as well as indices of cluster validity.Keywords
This publication has 29 references indexed in Scilit:
- Seasonal variations of biochemical, pigment, fatty acid, and sterol compositions in female Crassostrea corteziensis oysters in relation to the reproductive cycleComparative Biochemistry and Physiology Part B: Biochemistry and Molecular Biology, 2012
- Coordinating Environmental Genomics and Geochemistry Reveals Metabolic Transitions in a Hot Spring EcosystemPLOS ONE, 2012
- Population structure and linkage disequilibrium in elite barley breeding germplasm from the United StatesJournal of Zhejiang University-SCIENCE B, 2012
- Hypoalbuminaemia, systemic albumin leak and endothelial dysfunction in peritoneal dialysis patientsNephrology Dialysis Transplantation, 2012
- Batch effect correction for genome-wide methylation data with Illumina Infinium platformBMC Medical Genomics, 2011
- Selenium and 17 other largely essential and toxic metals in muscle and organ meats of Red Deer (Cervus elaphus) — Consequences to human healthEnvironment International, 2011
- MPCA: Multilinear Principal Component Analysis of Tensor ObjectsIEEE Transactions on Neural Networks, 2008
- A Nonlinear Mapping for Data Structure AnalysisIEEE Transactions on Computers, 1969
- Adaptive Control ProcessesPublished by Walter de Gruyter GmbH ,1961
- LIII. On lines and planes of closest fit to systems of points in spaceThe London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 1901