Modeling Semantic Aspects for Cross-Media Image Indexing
- 27 August 2007
- journal article
- research article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Pattern Analysis and Machine Intelligence
- Vol. 29 (10), 1802-1817
- https://doi.org/10.1109/tpami.2007.1097
Abstract
To go beyond the query-by-example paradigm in image retrieval, there is a need for semantic indexing of large image collections for intuitive text-based image search. Different models have been proposed to learn the dependencies between the visual content of an image set and the associated text captions, then allowing for the automatic creation of semantic indexes for unannotated images. The task, however, remains unsolved. In this paper, we present three alternatives to learn a probabilistic latent semantic analysis (PLSA) model for annotated images and evaluate their respective performance for automatic image indexing. Under the PLSA assumptions, an image is modeled as a mixture of latent aspects that generates both image features and text captions, and we investigate three ways to learn the mixture of aspects. We also propose a more discriminative image representation than the traditional Blob histogram, concatenating quantized local color information and quantized local texture descriptors. The first learning procedure of a PLSA model for annotated images is a standard expectation-maximization (EM) algorithm, which implicitly assumes that the visual and the textual modalities can be treated equivalently. The other two models are based on an asymmetric PLSA learning, allowing to constrain the definition of the latent space on the visual or on the textual modality. We demonstrate that the textual modality is more appropriate to learn a semantically meaningful latent space, which translates into improved annotation performance. A comparison of our learning algorithms with respect to recent methods on a standard data set is presented, and a detailed evaluation of the performance shows the validity of our framework.Keywords
This publication has 28 references indexed in Scilit:
- Rapid object detection using a boosted cascade of simple featuresPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- A Bayesian Hierarchical Model for Learning Natural Scene CategoriesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Distinctive Image Features from Scale-Invariant KeypointsInternational Journal of Computer Vision, 2004
- Robust Real-Time Face DetectionInternational Journal of Computer Vision, 2004
- Using Maximum Entropy for Automatic Image AnnotationLecture Notes in Computer Science, 2004
- Boosting Image RetrievalInternational Journal of Computer Vision, 2004
- Unsupervised feature selection applied to content-based retrieval of lung imagesIEEE Transactions on Pattern Analysis and Machine Intelligence, 2003
- Multimedia semantic indexing using model vectorsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Content-based image retrieval at the end of the early yearsIeee Transactions On Pattern Analysis and Machine Intelligence, 2000
- VisualSEEkPublished by Association for Computing Machinery (ACM) ,1996