Cross-modal Retrieval with Correspondence Autoencoder
Top Cited Papers
- 3 November 2014
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM)
Abstract
The problem of cross-modal retrieval, e.g., using a text query to search for images and vice-versa, is considered in this paper. A novel model involving correspondence autoencoder (Corr-AE) is proposed here for solving this problem. The model is constructed by correlating hidden representations of two uni-modal autoencoders. A novel optimal objective, which minimizes a linear combination of representation learning errors for each modality and correlation learning error between hidden representations of two modalities, is used to train the model as a whole. Minimization of correlation learning error forces the model to learn hidden representations with only common information in different modalities, while minimization of representation learning error makes hidden representations are good enough to reconstruct input of each modality. A parameter $\alpha$ is used to balance the representation learning error and the correlation learning error. Based on two different multi-modal autoencoders, Corr-AE is extended to other two correspondence models, here we called Corr-Cross-AE and Corr-Full-AE. The proposed models are evaluated on three publicly available data sets from real scenes. We demonstrate that the three correspondence autoencoders perform significantly better than three canonical correlation analysis based models and two popular multi-modal deep models on cross-modal retrieval tasks.
Keywords
Funding Information
- National Natural Science Foundation of China (61273365)
- Discipline building plan in 111 base of China (B08004)
- Fundamental Research Funds for the Central Universities of China (2013RC0304)
- Ministry of Science and Technology of the People's Republic of China (2012AA011103)
- Beijing University of Posts and Telecommunications
This publication has 17 references indexed in Scilit:
- An Efficient Learning Procedure for Deep Boltzmann MachinesNeural Computation, 2012
- Large scale image annotation: learning to rank with joint word-image embeddingsMachine Learning, 2010
- Every Picture Tells a Story: Generating Sentences from ImagesLecture Notes in Computer Science, 2010
- NUS-WIDEPublished by Association for Computing Machinery (ACM) ,2009
- Learning Deep Architectures for AIFoundations and Trends® in Machine Learning, 2008
- Reducing the Dimensionality of Data with Neural NetworksScience, 2006
- A Fast Learning Algorithm for Deep Belief NetsNeural Computation, 2006
- Canonical Correlation Analysis: An Overview with Application to Learning MethodsNeural Computation, 2004
- Training Products of Experts by Minimizing Contrastive DivergenceNeural Computation, 2002
- Color and texture descriptorsIEEE Transactions on Circuits and Systems for Video Technology, 2001