Human Focused Video Description

Abstract
This contribution addresses generation of natural language descriptions for human actions and behaviour observed in video streams. The work starts with implementation of conventional image processing techniques to extract high-level features from video. Because human is often the most important and also interesting feature, description focuses on humans and their activities. Although feature extraction processes are erroneous at various levels, we explore approaches to put them together to produce a coherent description. Evaluation is made by calculating the overlap similarity score between human authored and machine generated descriptions.

This publication has 18 references indexed in Scilit: