Detection-Free Multiobject Tracking by Reconfigurable Inference With Bundle Representations

Abstract
This paper presents a conceptually simple but effective approach to track multiobject in videos without requiring elaborate supervision (i.e., training object detectors or templates offline). Our framework performs a bi-layer inference of spatio-temporal grouping to exploit rich appearance and motion information in the observed sequence. First, we generate a robust middle-level video representation based on clustered point tracks, namely video bundles. Each bundle encapsulates a chunk of point tracks satisfying both spatial proximity and temporal coherency. Taking the video bundles as vertices, we build a spatio-temporal graph that incorporates both competitive and compatible relations among vertices. The multiobject tracking can be then phrased as a graph partition problem under the Bayesian framework, and we solve it by developing a reconfigurable belief propagation (BP) algorithm. This algorithm improves the traditional BP method by allowing a converged solution to be reconfigured during optimization, so that the inference can be reactivated once it gets stuck in local minima and thus conduct more reliable results. In the experiments, we demonstrate the superior performances of our approach on the challenging benchmarks compared with other state-of-the-art methods.
Funding Information
  • National Natural Science Foundation of China (61271093)
  • Guangdong Natural Science Foundation (S2013050014548, 2014A030313201)
  • Program of Guangzhou Zhujiang Star of Science and Technology (2013J2200067)
  • Science and Technology Program of Guangzhou (1563000439)

This publication has 35 references indexed in Scilit: