Abstract
The main challenges of visual object tracking arise from the arbitrary appearance of the objects that need to be tracked. Most existing algorithms try to solve this problem by training a new model to regenerate or classify each tracked object. As a result, the model needs to be initialized and retrained for each new object. In this paper, we propose to track different objects in an object-independent approach with a novel two-flow convolutional neural network (named YCNN). The YCNN takes two inputs (one is an object image patch, the other is a larger searching image patch), then outputs a response map which predicts how likely and where the object would appear in the search patch. Unlike the object-specific approaches, the YCNN is actually trained to measure the similarity between the two image patches. Thus, this model will not be limited to any specific object. Furthermore, the network is end-to-end trained to extract both shallow and deep dedicated convolutional features for visual tracking. And once properly trained, the YCNN can be used to track all kinds of objects without further training and updating. As a result, our algorithm is able to run at a very high speed of 45 frames-per-second. The effectiveness of the proposed algorithm can also be proved by the experiments on two popular datasets: OTB-100 and VOT-2014.
Funding Information
  • National Natural Science Foundation of China (61772213, 61371140, 2015CFA062, 2015BAA133, 2017010201010121)

This publication has 22 references indexed in Scilit: