Abstract
The paper is devoted to the task of multiple objects tracking and segmentation on monocular video, which was obtained by the camera of unmanned ground vehicle. The authors investigate various architectures of deep neural networks for this task solution. Special attention is paid to deep models providing inference in real time. The authors proposed an approach based on combining the modern SOLOv2 instance segmentation model, a neural network model for embedding generation for each found object, and a modified Hungarian tracking algorithm. The Hungarian algorithm was modified taking into account the geometric constraints on the positions of the found objects on the sequence of images. The investigated solution is a development and improvement of the state-of-the-art PointTrack method. The effectiveness of the proposed approach is demonstrated quantitatively and qualitatively on the popular KITTI MOTS dataset collected using the cameras of a driverless car. The software implementation of the approach was carried out. The acceleration of the procedure for the formation of a two-dimensional point cloud in the found image segment was done using the NVidia CUDA technology. At the same time, the proposed instance segmentation module provides a mean processing time of one image of 68 ms, the embedding and tracking module of 24 ms using the NVidia Tesla V100 GPU. This indicates that the proposed solution is promising for on-board computer vision systems for both unmanned vehicles and various robotic platforms.