California-ND: An annotated dataset for near-duplicate detection in personal photo collections

Abstract
Managing photo collections involves a variety of image quality assessment tasks, e.g. the selection of the “best” photos. Detecting near-duplicate images is a prerequisite for automating these tasks. This paper presents a new dataset that may assist researchers in testing algorithms for the detection of near-duplicates in personal photo libraries. The proposed dataset is derived directly from an actual personal travel photo collection. It contains many difficult cases and types of near-duplicates. More importantly, in order to deal with the inevitable ambiguity that the near-duplicate cases exhibit, the dataset is annotated by 10 different subjects. These annotations are combined into a non-binary ground truth, which indicates the probability that a pair of images may be considered a near-duplicate by an observer.

This publication has 18 references indexed in Scilit: