In this paper, we describe a first publicly available fine-grained product recognition dataset based on leaflet images. Using advertisement leaflets, collected over several years from different European retailers, we provide a total of 41.6k manually annotated product images in 832 classes. Further, we investigate three different approaches for this fine-grained product classification task, Classification by Image, by Text, as well as by Image and Text. The approach "Classification by Text" uses the text extracted directly from the leaflet product images. We show, that the combination of image and text as input improves the classification of visual difficult to distinguish products. The final model leads to an accuracy of 96.4% with a Top-3 score of 99.2%. We release our code at https://github.com/ladwigd/Leaflet-Product-Classification.
翻译:本文介绍了首个基于传单图像的公开细粒度商品识别数据集。利用多年从不同欧洲零售商收集的广告传单,我们提供了总计41.6万张人工标注的商品图像,涵盖832个类别。针对该细粒度商品分类任务,我们研究了三种不同方法:基于图像的分类、基于文本的分类,以及基于图像与文本的分类。其中"基于文本的分类"方法直接使用从传单商品图像中提取的文本。研究表明,将图像与文本作为输入组合使用,可改善视觉上难以区分的产品的分类效果。最终模型达到96.4%的准确率,Top-3准确率为99.2%。我们已将代码开源至 https://github.com/ladwigd/Leaflet-Product-Classification。