Cross-modal Retrieval with Correspondence Autoencoder

Top Cited Papers

3 November 2014

conference paper
conference paper
Published by Association for Computing Machinery (ACM)

https://doi.org/10.1145/2647868.2654902

Abstract

The problem of cross-modal retrieval, e.g., using a text query to search for images and vice-versa, is considered in this paper. A novel model involving correspondence autoencoder (Corr-AE) is proposed here for solving this problem. The model is constructed by correlating hidden representations of two uni-modal autoencoders. A novel optimal objective, which minimizes a linear combination of representation learning errors for each modality and correlation learning error between hidden representations of two modalities, is used to train the model as a whole. Minimization of correlation learning error forces the model to learn hidden representations with only common information in different modalities, while minimization of representation learning error makes hidden representations are good enough to reconstruct input of each modality. A parameter $\alpha$ is used to balance the representation learning error and the correlation learning error. Based on two different multi-modal autoencoders, Corr-AE is extended to other two correspondence models, here we called Corr-Cross-AE and Corr-Full-AE. The proposed models are evaluated on three publicly available data sets from real scenes. We demonstrate that the three correspondence autoencoders perform significantly better than three canonical correlation analysis based models and two popular multi-modal deep models on cross-modal retrieval tasks.

Keywords

Funding Information

National Natural Science Foundation of China (61273365)
Discipline building plan in 111 base of China (B08004)
Fundamental Research Funds for the Central Universities of China (2013RC0304)
Ministry of Science and Technology of the People's Republic of China (2012AA011103)
Beijing University of Posts and Telecommunications

This publication has 17 references indexed in Scilit:

An Efficient Learning Procedure for Deep Boltzmann Machines
Neural Computation, 2012
Large scale image annotation: learning to rank with joint word-image embeddings
Machine Learning, 2010
Every Picture Tells a Story: Generating Sentences from Images
Lecture Notes in Computer Science, 2010
NUS-WIDE
Published by Association for Computing Machinery (ACM) ,2009
Learning Deep Architectures for AI
Foundations and Trends® in Machine Learning, 2008
Reducing the Dimensionality of Data with Neural Networks
Science, 2006
A Fast Learning Algorithm for Deep Belief Nets
Neural Computation, 2006
Canonical Correlation Analysis: An Overview with Application to Learning Methods
Neural Computation, 2004
Training Products of Experts by Minimizing Contrastive Divergence
Neural Computation, 2002
Color and texture descriptors
IEEE Transactions on Circuits and Systems for Video Technology, 2001

Cited by 434 articles