Same-style products retrieval plays an important role in e-commerce platforms, aiming to identify the same products which may have different text descriptions or images. It can be used for similar products retrieval from different suppliers or duplicate products detection of one supplier. Common methods use the image as the detected object, but they only consider the visual features and overlook the attribute information contained in the textual descriptions, and perform weakly for products in image less important industries like machinery, hardware tools and electronic component, even if an additional text matching module is added. In this paper, we propose a unified vision-language modeling method for e-commerce same-style products retrieval, which is designed to represent one product with its textual descriptions and visual contents. It contains one sampling skill to collect positive pairs from user click log with category and relevance constrained, and a novel contrastive loss unit to model the image, text, and image+text representations into one joint embedding space. It is capable of cross-modal product-to-product retrieval, as well as style transfer and user-interactive search. Offline evaluations on annotated data demonstrate its superior retrieval performance, and online testings show it can attract more clicks and conversions. Moreover, this model has already been deployed online for similar products retrieval in alibaba.com, the largest B2B e-commerce platform in the world.
翻译:同款商品检索在电商平台中扮演重要角色,旨在识别文本描述或图像可能不同的同一商品。它可用于不同供应商间的相似商品检索或同一供应商的重复商品检测。常见方法以图像为检测对象,但仅考虑视觉特征而忽略文本描述中的属性信息,在机械、五金工具、电子元件等图像重要性较低的行业中表现较弱,即使添加额外文本匹配模块也难以改善。本文提出一种面向电商同款商品检索的统一视觉-语言建模方法,该方法通过文本描述与视觉内容共同表示商品。其中包含一种采样技巧,可从用户点击日志中收集受类别和相关性约束的正样本对,并设计一种新型对比损失单元,将图像、文本以及图像+文本表示映射至同一联合嵌入空间。该方法不仅支持跨模态的商品对商品检索,还可实现风格迁移与用户交互式搜索。离线评估在标注数据上展示了其卓越的检索性能,在线测试表明它能吸引更多点击和转化。此外,该模型已在全球最大B2B电商平台阿里巴巴国际站(alibaba.com)中上线,用于相似商品检索。