Learning unsupervised feature representations for single cell microscopy images with paired cell inpainting

Open Access

3 September 2019

journal article
research article
Published by Public Library of Science (PLoS) in PLoS Computational Biology

Vol. 15 (9), e1007348
https://doi.org/10.1371/journal.pcbi.1007348

Abstract

Cellular microscopy images contain rich insights about biology. To extract this information, researchers use features, or measurements of the patterns of interest in the images. Here, we introduce a convolutional neural network (CNN) to automatically design features for fluorescence microscopy. We use a self-supervised method to learn feature representations of single cells in microscopy images without labelled training data. We train CNNs on a simple task that leverages the inherent structure of microscopy images and controls for variation in cell morphology and imaging: given one cell from an image, the CNN is asked to predict the fluorescence pattern in a second different cell from the same image. We show that our method learns high-quality features that describe protein expression patterns in single cells both yeast and human microscopy datasets. Moreover, we demonstrate that our features are useful for exploratory biological analysis, by capturing high-resolution cellular components in a proteome-wide cluster analysis of human proteins, and by quantifying multi-localized proteins and single-cell variability. We believe paired cell inpainting is a generalizable method to obtain feature representations of single cells in multichannel microscopy images. To understand the cell biology captured by microscopy images, researchers use features, or measurements of relevant properties of cells, such as the shape or size of cells, or the intensity of fluorescent markers. Features are the starting point of most image analysis pipelines, so their quality in representing cells is fundamental to the success of an analysis. Classically, researchers have relied on features manually defined by imaging experts. In contrast, deep learning techniques based on convolutional neural networks (CNNs) automatically learn features, which can outperform manually-defined features at image analysis tasks. However, most CNN methods require large manually-annotated training datasets to learn useful features, limiting their practical application. Here, we developed a new CNN method that learns high-quality features for single cells in microscopy images, without the need for any labeled training data. We show that our features surpass other comparable features in identifying protein localization from images, and that our method can generalize to diverse datasets. By exploiting our method, researchers will be able to automatically obtain high-quality features customized to their own image datasets, facilitating many downstream analyses, as we highlight by demonstrating many possible use cases of our features in this study.

This publication has 52 references indexed in Scilit:

Unsupervised Clustering of Subcellular Protein Expression Patterns in High-Throughput Microscopy Images Reveals Protein Complexes and Functional Relationships between Proteins
PLoS Computational Biology, 2013
Image processing and recognition for biological images
Development, Growth & Differentiation, 2013
Dynamics of the DNA damage response: insights from live-cell imaging
Briefings in Functional Genomics, 2013
Automated Analysis and Reannotation of Subcellular Locations in Confocal Images from the Human Protein Atlas
PLOS ONE, 2012
Dissecting DNA damage response pathways by analysing protein localization and abundance changes during DNA replication stress
Nature, 2012
Origins of regulated cell-to-cell variability
Nature Reviews Molecular Cell Biology, 2011
Quantifying the distribution of probes between subcellular locations using unsupervised pattern unmixing
Bioinformatics, 2010
GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists
BMC Bioinformatics, 2009
DBMLoc: a Database of proteins with multiple subcellular localizations
BMC Bioinformatics, 2008
Global analysis of protein localization in budding yeast
Nature, 2003

Cited by 65 articles