Searching by image is popular yet still challenging due to the extensive interference arose from i) data variations (e.g., background, pose, visual angle, brightness) of real-world captured images and ii) similar images in the query dataset. This paper studies a practically meaningful problem of beauty product retrieval (BPR) by neural networks. We broadly extract different types of image features, and raise an intriguing question that whether these features are beneficial to i) suppress data variations of real-world captured images, and ii) distinguish one image from others which look very similar but are intrinsically different beauty products in the dataset, therefore leading to an enhanced capability of BPR. To answer it, we present a novel variable-attention neural network to understand the combination of multiple features (termed VM-Net) of beauty product images. Considering that there are few publicly released training datasets for BPR, we establish a new dataset with more than one million images classified into more than 20K categories to improve both the generalization and anti-interference abilities of VM-Net and other methods. We verify the performance of VM-Net and its competitors on the benchmark dataset Perfect-500K, where VM-Net shows clear improvements over the competitors in terms of MAP@7. The source code and dataset will be released upon publication.
翻译:以图搜图虽已普及,但仍面临巨大挑战,这主要源于两方面干扰:一是真实场景拍摄图像的数据变化(如背景、姿态、视角、亮度),二是查询数据集中存在的相似图像。本文研究了一个具有实际意义的神经网路美妆产品检索(BPR)问题。我们广泛提取了不同类型的图像特征,并提出一个有趣的问题:这些特征是否有助于①抑制真实场景拍摄图像的数据变化,②区分那些外观相似但本质不同的美妆产品图像,从而提升BPR性能。为解答此问题,我们提出了一种新颖的可变注意力神经网络(称为VM-Net),用于理解美妆产品图像的多种特征组合。鉴于目前公开可用于BPR的训练数据集较少,我们构建了一个包含超过100万张图像、涵盖20,000多个类别的新数据集,以提升VM-Net及其他方法的泛化能力和抗干扰能力。我们在基准数据集Perfect-500K上验证了VM-Net及其对比方法的性能,结果显示VM-Net在MAP@7指标上明显优于其他方法。源代码及数据集将在论文发表后公开。