Unified Vision-Language Representation Modeling for E-Commerce Same-Style Products Retrieval

Same-style products retrieval plays an important role in e-commerce platforms, aiming to identify the same products which may have different text descriptions or images. It can be used for similar products retrieval from different suppliers or duplicate products detection of one supplier. Common methods use the image as the detected object, but they only consider the visual features and overlook the attribute information contained in the textual descriptions, and perform weakly for products in image less important industries like machinery, hardware tools and electronic component, even if an additional text matching module is added. In this paper, we propose a unified vision-language modeling method for e-commerce same-style products retrieval, which is designed to represent one product with its textual descriptions and visual contents. It contains one sampling skill to collect positive pairs from user click log with category and relevance constrained, and a novel contrastive loss unit to model the image, text, and image+text representations into one joint embedding space. It is capable of cross-modal product-to-product retrieval, as well as style transfer and user-interactive search. Offline evaluations on annotated data demonstrate its superior retrieval performance, and online testings show it can attract more clicks and conversions. Moreover, this model has already been deployed online for similar products retrieval in alibaba.com, the largest B2B e-commerce platform in the world.

翻译：同款商品检索在电商平台中扮演重要角色，旨在识别具有不同文本描述或图像的同款商品。该技术可用于不同供应商间的相似商品检索或单供应商的重复商品检测。现有方法通常以图像为检测对象，但仅关注视觉特征而忽略了文本描述中的属性信息，在对图像重要性较低的机械、五金工具、电子元件等品类中表现欠佳，即便增加文本匹配模块也难以改善。本文提出面向电商同款商品检索的统一视觉-语言建模方法，通过融合商品的文本描述与视觉内容进行联合表示。该方法包含两项核心设计：一是基于类别相关性约束从用户点击日志中采集正样本对的采样策略，二是通过新型对比损失单元将图像、文本及图像-文本联合表示映射至同一嵌入空间。该模型不仅支持跨模态的商品间检索，还可实现风格迁移与用户交互式搜索。离线标注数据评测表明其具有卓越的检索性能，在线测试显示能显著提升点击率与转化率。目前，该模型已部署于全球最大B2B电商平台阿里巴巴国际站，用于相似商品检索服务。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日