Image Compositing for Segmentation of Surgical Tools Without Manual Annotations

Open Access

1 May 2021

journal article
research article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Medical Imaging

Vol. 40 (5), 1450-1460
https://doi.org/10.1109/TMI.2021.3057884

Abstract

Producing manual, pixel-accurate, image segmentation labels is tedious and time-consuming. This is often a rate-limiting factor when large amounts of labeled images are required, such as for training deep convolutional networks for instrument-background segmentation in surgical scenes. No large datasets comparable to industry standards in the computer vision community are available for this task. To circumvent this problem, we propose to automate the creation of a realistic training dataset by exploiting techniques stemming from special effects and harnessing them to target training performance rather than visual appeal. Foreground data is captured by placing sample surgical instruments over a chroma key (a.k.a. green screen) in a controlled environment, thereby making extraction of the relevant image segment straightforward. Multiple lighting conditions and viewpoints can be captured and introduced in the simulation by moving the instruments and camera and modulating the light source. Background data is captured by collecting videos that do not contain instruments. In the absence of pre-existing instrument-free background videos, minimal labeling effort is required, just to select frames that do not contain surgical instruments from videos of surgical interventions freely available online. We compare different methods to blend instruments over tissue and propose a novel data augmentation approach that takes advantage of the plurality of options. We show that by training a vanilla U-Net on semi-synthetic data only and applying a simple post-processing, we are able to match the results of the same network trained on a publicly available manually labeled real dataset.

Funding Information

Wellcome (203148/Z/16/Z, 203145Z/16/Z, WT101957)
Engineering and Physical Sciences Research Council (NS/A000049/1, NS/A000050/1, NS/A000027/1, EP/L016478/1)
European Union’s Horizon 2020 Research and Innovation Program through the Marie Skłodowska-Curie Grant under Agreement (TRABIT 765148)
Medtronic/RAEng Research Chair (RCSRF1819\7\34)

This publication has 34 references indexed in Scilit:

Crowd-Algorithm Collaboration for Large-Scale Endoscopic Image Annotation with Confidence
Lecture Notes in Computer Science, 2016
Learning Deep Object Detectors from 3D Models
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
U-Net: Convolutional Networks for Biomedical Image Segmentation
Published by Springer Science and Business Media LLC ,2015
Real-time ultrasound transducer localization in fluoroscopy images by transfer learning from synthetic training data
Medical Image Analysis, 2014
Atlas Encoding by Randomized Forests for Efficient Label Propagation
Lecture Notes in Computer Science, 2013
Feature Classification for Tracking Articulated Surgical Tools
Lecture Notes in Computer Science, 2012
Towards image guided robotic surgery: multi-arm tracking through hybrid localization
International Journal of Computer Assisted Radiology and Surgery, 2009
"GrabCut"
ACM Transactions on Graphics, 2004
A multiresolution spline with application to image mosaics
ACM Transactions on Graphics, 1983

Cited by 21 articles