Learning Cross-Modal Embeddings for Cooking Recipes and Food Images
Open Access
- 1 July 2017
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- No. 10636919,p. 3068-3076
- https://doi.org/10.1109/cvpr.2017.327
Abstract
In this paper, we introduce Recipe1M, a new large-scale, structured corpus of over 1m cooking recipes and 800k food images. As the largest publicly available collection of recipe data, Recipe1M affords the ability to train high-capacity models on aligned, multi-modal data. Using these data, we train a neural network to find a joint embedding of recipes and images that yields impressive results on an image-recipe retrieval task. Additionally, we demonstrate that regularization via the addition of a high-level classification objective both improves retrieval performance to rival that of humans and enables semantic vector arithmetic. We postulate that these embeddings will provide a basis for further exploration of the Recipe1M dataset and food and cooking in general. Code, data and models are publicly available.Keywords
This publication has 12 references indexed in Scilit:
- Is Saki #delicious?Published by Association for Computing Machinery (ACM) ,2017
- Understanding and Predicting Online Food Recipe Production PatternsPublished by Association for Computing Machinery (ACM) ,2016
- Learning Aligned Cross-Modal Representations from Weakly Aligned DataPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2016
- DeepFood: Deep Learning-Based Food Image Recognition for Computer-Aided Dietary AssessmentPublished by Springer Science and Business Media LLC ,2016
- Social Media Image Analysis for Public HealthPublished by Association for Computing Machinery (ACM) ,2016
- Im2Calories: Towards an Automated Mobile Vision Food DiaryPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2015
- Deep visual-semantic alignments for generating image descriptionsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2015
- ImageNet Large Scale Visual Recognition ChallengeInternational Journal of Computer Vision, 2015
- FoodCam: A real-time food recognition system on a smartphoneMultimedia Tools and Applications, 2014
- Food-101 – Mining Discriminative Components with Random ForestsLecture Notes in Computer Science, 2014