Transferable Visual Words: Exploiting the Semantics of Anatomical Patterns for Self-Supervised Learning
- 29 September 2021
- journal article
- research article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Medical Imaging
- Vol. 40 (10), 2857-2868
- https://doi.org/10.1109/TMI.2021.3060634
Abstract
This paper introduces a new concept called "transferable visual words" (TransVW), aiming to achieve annotation efficiency for deep learning in medical image analysis. Medical imaging-focusing on particular parts of the body for defined clinical purposes-generates images of great similarity in anatomy across patients and yields sophisticated anatomical patterns across images, which are associated with rich semantics about human anatomy and which are natural visual words. We show that these visual words can be automatically harvested according to anatomical consistency via self-discovery, and that the self-discovered visual words can serve as strong yet free supervision signals for deep models to learn semantics-enriched generic image representation via self-supervision (self-classification and self-restoration). Our extensive experiments demonstrate the annotation efficiency of TransVW by offering higher performance and faster convergence with reduced annotation cost in several applications. Our TransVW has several important advantages, including (1) TransVW is a fully autodidactic scheme, which exploits the semantics of visual words for self-supervised learning, requiring no expert annotation; (2) visual word learning is an add-on strategy, which complements existing self-supervised methods, boosting their performance; and (3) the learned image representation is semantics-enriched models, which have proven to be more robust and generalizable, saving annotation efforts for a variety of applications through transfer learning. Our code, pre-trained models, and curated visual words are available at https://github.com/JLiangLab/TransVW.Funding Information
- ASU and Mayo Clinic through a Seed Grant and an Innovation Grant
- NIH (R01HL128785)
- GPUs through the ASU Research Computing
- Extreme Science and Engineering Discovery Environment
- National Science Foundation (ACI-1548562)
This publication has 38 references indexed in Scilit:
- Unsupervised Learning of Visual Representations by Solving Jigsaw PuzzlesPublished by Springer Science and Business Media LLC ,2016
- Context Encoders: Feature Learning by InpaintingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2016
- NetVLAD: CNN Architecture for Weakly Supervised Place RecognitionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2016
- Convolutional Neural Networks for Medical Image Analysis: Full Training or Fine Tuning?IEEE Transactions on Medical Imaging, 2016
- Unsupervised Visual Representation Learning by Context PredictionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2015
- Computer-Aided Pulmonary Embolism Detection Using a Novel Vessel-Aligned Multi-planar Image Representation and Convolutional Neural NetworksPublished by Springer Science and Business Media LLC ,2015
- Exploiting local features from deep networks for image retrievalPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2015
- The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): A Completed Reference Database of Lung Nodules on CT ScansMedical Physics, 2011
- Distinctive Image Features from Scale-Invariant KeypointsInternational Journal of Computer Vision, 2004
- Video Google: a text retrieval approach to object matching in videosPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003