Unified Vision-Language Representation Modeling for E-Commerce Same-Style Products Retrieval

Same-style products retrieval plays an important role in e-commerce platforms, aiming to identify the same products which may have different text descriptions or images. It can be used for similar products retrieval from different suppliers or duplicate products detection of one supplier. Common methods use the image as the detected object, but they only consider the visual features and overlook the attribute information contained in the textual descriptions, and perform weakly for products in image less important industries like machinery, hardware tools and electronic component, even if an additional text matching module is added. In this paper, we propose a unified vision-language modeling method for e-commerce same-style products retrieval, which is designed to represent one product with its textual descriptions and visual contents. It contains one sampling skill to collect positive pairs from user click log with category and relevance constrained, and a novel contrastive loss unit to model the image, text, and image+text representations into one joint embedding space. It is capable of cross-modal product-to-product retrieval, as well as style transfer and user-interactive search. Offline evaluations on annotated data demonstrate its superior retrieval performance, and online testings show it can attract more clicks and conversions. Moreover, this model has already been deployed online for similar products retrieval in alibaba.com, the largest B2B e-commerce platform in the world.

翻译：同款商品检索在电商平台中扮演重要角色，旨在识别文本描述或图像可能不同的同一商品。它可用于不同供应商间的相似商品检索或同一供应商的重复商品检测。常见方法以图像为检测对象，但仅考虑视觉特征而忽略文本描述中的属性信息，在机械、五金工具、电子元件等图像重要性较低的行业中表现较弱，即使添加额外文本匹配模块也难以改善。本文提出一种面向电商同款商品检索的统一视觉-语言建模方法，该方法通过文本描述与视觉内容共同表示商品。其中包含一种采样技巧，可从用户点击日志中收集受类别和相关性约束的正样本对，并设计一种新型对比损失单元，将图像、文本以及图像+文本表示映射至同一联合嵌入空间。该方法不仅支持跨模态的商品对商品检索，还可实现风格迁移与用户交互式搜索。离线评估在标注数据上展示了其卓越的检索性能，在线测试表明它能吸引更多点击和转化。此外，该模型已在全球最大B2B电商平台阿里巴巴国际站（alibaba.com）中上线，用于相似商品检索。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【超赞的#C++#速查&信息图】“hacking c++ - Cheat Sheets & Infographics”

专知会员服务

30+阅读 · 2022年3月8日

【Facebook-Ishan Mishra】计算机视觉自监督学习，92页ppt

专知会员服务

36+阅读 · 2021年7月7日