Captions Are Worth a Thousand Words: Enhancing Product Retrieval with Pretrained Image-to-Text Models

from arxiv, The 3rd International Workshop on Interactive and Scalable Information Retrieval Methods for E-Commerce (ISIR-eCom 2024) Held in conjunction with ACM WSDM - March 8th, 2024

This paper explores the usage of multimodal image-to-text models to enhance text-based item retrieval. We propose utilizing pre-trained image captioning and tagging models, such as instructBLIP and CLIP, to generate text-based product descriptions which are combined with existing text descriptions. Our work is particularly impactful for smaller eCommerce businesses who are unable to maintain the high-quality text descriptions necessary to effectively perform item retrieval for search and recommendation use cases. We evaluate the searchability of ground-truth text, image-generated text, and combinations of both texts on several subsets of Amazon's publicly available ESCI dataset. The results demonstrate the dual capability of our proposed models to enhance the retrieval of existing text and generate highly-searchable standalone descriptions.

翻译：摘要：本文探索了利用多模态图像到文本模型增强基于文本的物品检索方法。我们提出使用预训练的图像描述和标注模型（如instructBLIP和CLIP）生成文本型产品描述，并将其与现有文本描述相结合。本研究尤其适用于无法维护高质量文本描述的小型电商企业，这类描述对于实现搜索和推荐场景中的物品检索至关重要。基于亚马逊公开的ESCI数据集多个子集，我们评估了真实文本、图像生成文本及两者组合文本的可检索性。结果表明，所提出的模型具有双重能力：既能增强现有文本的检索性能，又能生成高可检索性的独立描述。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日