Action-Driven Visual Object Tracking With Deep Reinforcement Learning

Abstract
In this paper, we propose an efficient visual tracker, which directly captures a bounding box containing the target object in a video by means of sequential actions learned using deep neural networks. The proposed deep neural network to control tracking actions is pretrained using various training video sequences and fine-tuned during actual tracking for online adaptation to a change of target and background. The pretraining is done by utilizing deep reinforcement learning (RL) as well as supervised learning. The use of RL enables even partially labeled data to be successfully utilized for semisupervised learning. Through the evaluation of the object tracking benchmark data set, the proposed tracker is validated to achieve a competitive performance at three times the speed of existing deep network-based trackers. The fast version of the proposed method, which operates in real time on graphics processing unit, outperforms the state-of-the-art real-time trackers with an accuracy improvement of more than 8%.
Funding Information
  • ICT R&D program of MSIP/IITP (Development of Predictive Visual Intelligence Technology) (B0101-15-0552)
  • ICT R&D program of MSIP/IITP (Development of High Performance Visual BigData Discovery Platform) (B0101-15-0266)
  • SNU-Samsung Smart Campus Research Center at Seoul National University
  • Brain Korea 21 Plus Project

This publication has 38 references indexed in Scilit: