Food image-to-recipe aims to learn an embedded space linking the rich semantics in recipes with the visual content in food image for cross-modal retrieval. The existing research works carry out the learning of such space by assuming that all the image-recipe training example pairs belong to the same cuisine. As a result, despite the excellent performance reported in the literature, such space is not transferable for retrieving recipes of different cuisine. In this paper, we aim to address this issue by cross-domain food image-to-recipe retrieval, such that by leveraging abundant image-recipe pairs in source domain (one cuisine), the embedding space is generalizable to a target domain (the other cuisine) that does not have images to pair with recipes for training. With the intuition that the importance of different source samples should vary, this paper proposes two novel mechanisms for cross-domain food image-to-recipe retrieval, i.e., source data selector and weighted cross-modal adversarial learning. The former aims to select source samples similar to the target data and filter out distinctive ones for training. The latter is capable to assign higher weights to the source samples more similar to the target data and lower weights to suppress the distinctive ones for both cross-modal and adversarial learning. The weights are computed from the recipe features extracted from a pre-trained source model. Experiments on three different cuisines (Chuan, Yue and Washoku) demonstrate that the proposed method manages to achieve state-of-the-art performances in all the transfers.
翻译:食物图像到食谱检索旨在学习一个嵌入空间,将食谱中的丰富语义与食物图像中的视觉内容关联起来,以实现跨模态检索。现有研究在假设所有图像-食谱训练样本对属于同一种菜系的条件下进行该空间的学习。因此,尽管文献中报告了优异性能,但该空间无法迁移用于检索不同菜系的食谱。本文旨在通过跨领域食物图像到食谱检索解决这一问题,使得利用源领域(一种菜系)中丰富的图像-食谱对,嵌入空间能够泛化到没有图像与食谱配对用于训练的目标领域(另一种菜系)。基于不同源样本的重要性应有所不同的直觉,本文提出了两种新颖的跨领域食物图像到食谱检索机制,即源数据选择器和加权跨模态对抗学习。前者旨在选择与目标数据相似的源样本,并过滤掉差异显著的样本用于训练。后者能够为与目标数据更相似的源样本分配更高权重,并为抑制差异显著的样本降低权重,同时用于跨模态和对抗学习。这些权重根据从预训练源模型中提取的食谱特征计算得出。在三种不同菜系(川菜、粤菜、和食)上的实验表明,所提出的方法在所有迁移任务中均取得了最先进的性能。