Learning Cross-Modal Embeddings for Cooking Recipes and Food Images

Open Access

1 July 2017

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

No. 10636919,p. 3068-3076
https://doi.org/10.1109/cvpr.2017.327

Abstract

In this paper, we introduce Recipe1M, a new large-scale, structured corpus of over 1m cooking recipes and 800k food images. As the largest publicly available collection of recipe data, Recipe1M affords the ability to train high-capacity models on aligned, multi-modal data. Using these data, we train a neural network to find a joint embedding of recipes and images that yields impressive results on an image-recipe retrieval task. Additionally, we demonstrate that regularization via the addition of a high-level classification objective both improves retrieval performance to rival that of humans and enables semantic vector arithmetic. We postulate that these embeddings will provide a basis for further exploration of the Recipe1M dataset and food and cooking in general. Code, data and models are publicly available.

Keywords

This publication has 12 references indexed in Scilit:

Is Saki #delicious?
Published by Association for Computing Machinery (ACM) ,2017
Understanding and Predicting Online Food Recipe Production Patterns
Published by Association for Computing Machinery (ACM) ,2016
Learning Aligned Cross-Modal Representations from Weakly Aligned Data
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2016
DeepFood: Deep Learning-Based Food Image Recognition for Computer-Aided Dietary Assessment
Published by Springer Science and Business Media LLC ,2016
Social Media Image Analysis for Public Health
Published by Association for Computing Machinery (ACM) ,2016
Im2Calories: Towards an Automated Mobile Vision Food Diary
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
Deep visual-semantic alignments for generating image descriptions
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
ImageNet Large Scale Visual Recognition Challenge
International Journal of Computer Vision, 2015
FoodCam: A real-time food recognition system on a smartphone
Multimedia Tools and Applications, 2014
Food-101 – Mining Discriminative Components with Random Forests
Lecture Notes in Computer Science, 2014

Cited by 230 articles