Same-style products retrieval plays an important role in e-commerce platforms, aiming to identify the same products which may have different text descriptions or images. It can be used for similar products retrieval from different suppliers or duplicate products detection of one supplier. Common methods use the image as the detected object, but they only consider the visual features and overlook the attribute information contained in the textual descriptions, and perform weakly for products in image less important industries like machinery, hardware tools and electronic component, even if an additional text matching module is added. In this paper, we propose a unified vision-language modeling method for e-commerce same-style products retrieval, which is designed to represent one product with its textual descriptions and visual contents. It contains one sampling skill to collect positive pairs from user click log with category and relevance constrained, and a novel contrastive loss unit to model the image, text, and image+text representations into one joint embedding space. It is capable of cross-modal product-to-product retrieval, as well as style transfer and user-interactive search. Offline evaluations on annotated data demonstrate its superior retrieval performance, and online testings show it can attract more clicks and conversions. Moreover, this model has already been deployed online for similar products retrieval in alibaba.com, the largest B2B e-commerce platform in the world.
翻译:同款商品检索在电商平台中扮演重要角色,旨在识别具有不同文本描述或图像的同款商品。该技术可用于不同供应商间的相似商品检索或单供应商的重复商品检测。现有方法通常以图像为检测对象,但仅关注视觉特征而忽略了文本描述中的属性信息,在对图像重要性较低的机械、五金工具、电子元件等品类中表现欠佳,即便增加文本匹配模块也难以改善。本文提出面向电商同款商品检索的统一视觉-语言建模方法,通过融合商品的文本描述与视觉内容进行联合表示。该方法包含两项核心设计:一是基于类别相关性约束从用户点击日志中采集正样本对的采样策略,二是通过新型对比损失单元将图像、文本及图像-文本联合表示映射至同一嵌入空间。该模型不仅支持跨模态的商品间检索,还可实现风格迁移与用户交互式搜索。离线标注数据评测表明其具有卓越的检索性能,在线测试显示能显著提升点击率与转化率。目前,该模型已部署于全球最大B2B电商平台阿里巴巴国际站,用于相似商品检索服务。