Visualizing structure and transitions in high-dimensional biological data

Abstract
The high-dimensional data created by high-throughput technologies require visualization tools that reveal data structure and patterns in an intuitive form. We present PHATE, a visualization method that captures both local and global nonlinear structure using an information-geometric distance between data points. We compare PHATE to other tools on a variety of artificial and biological datasets, and find that it consistently preserves a range of patterns in data, including continual progressions, branches and clusters, better than other tools. We define a manifold preservation metric, which we call denoised embedding manifold preservation (DEMaP), and show that PHATE produces lower-dimensional embeddings that are quantitatively better denoised as compared to existing visualization methods. An analysis of a newly generated single-cell RNA sequencing dataset on human germ-layer differentiation demonstrates how PHATE reveals unique biological insight into the main developmental branches, including identification of three previously undescribed subpopulations. We also show that PHATE is applicable to a wide variety of data types, including mass cytometry, single-cell RNA sequencing, Hi-C and gut microbiome data.
Funding Information
  • Gruber Foundation
  • U.S. Department of Health & Human Services | NIH | Eunice Kennedy Shriver National Institute of Child Health and Human Development (F31HD097958)
  • Alfred P. Sloan Foundation (FG-2016-6607)
  • United States Department of Defense | Defense Advanced Research Projects Agency (D16AP00117)
  • U.S. Department of Health & Human Services | National Institutes of Health (1R01HG008383, R01GM107092, R01GM130847)
  • l’institut de valorisation des donnees