Joint Summarization of Large-Scale Collections of Web Images and Videos for Storyline Reconstruction
- 1 June 2014
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 4225-4232
- https://doi.org/10.1109/cvpr.2014.538
Abstract
In this paper, we address the problem of jointly summarizing large sets of Flickr images and YouTube videos. Starting from the intuition that the characteristics of the two media types are different yet complementary, we develop a fast and easily-parallelizable approach for creating not only high-quality video summaries but also novel structural summaries of online images as storyline graphs. The storyline graphs can illustrate various events or activities associated with the topic in a form of a branching network. The video summarization is achieved by diversity ranking on the similarity graphs between images and video frames. The reconstruction of storyline graphs is formulated as the inference of sparse time-varying directed graphs from a set of photo streams with assistance of videos. For evaluation, we collect the datasets of 20 outdoor activities, consisting of 2.7M Flickr images and 16K YouTube videos. Due to the large-scale nature of our problem, we evaluate our algorithm via crowdsourcing using Amazon Mechanical Turk. In our experiments, we demonstrate that the proposed joint summarization approach outperforms other baselines and our own methods using videos or images only.Keywords
This publication has 15 references indexed in Scilit:
- Story-Driven Summarization for Egocentric VideoPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2013
- Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled SnapshotsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2013
- Learning object class detectors from weakly annotated videoPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2012
- Estimating time-varying networksThe Annals of Applied Statistics, 2010
- TV News Story Segmentation Based on Semantic Coherence and Content SimilarityLecture Notes in Computer Science, 2010
- In defense of Nearest-Neighbor based image classificationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2008
- 80 Million Tiny Images: A Large Data Set for Nonparametric Object and Scene RecognitionIeee Transactions On Pattern Analysis and Machine Intelligence, 2008
- Online, simultaneous shot boundary detection and key frame extraction for sports videos using rank tracingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2008
- Scene Summarization for Online Image CollectionsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2007
- High-dimensional graphs and variable selection with the LassoThe Annals of Statistics, 2006