Enhancing Relevance of Embedding-based Retrieval at Walmart

Juexin Lin,Sachin Yadav,Feng Liu,Nicholas Rossi,Praveen Reddy Suram,Satya Chembolu,Prijith Chandran,Hrushikesh Mohapatra,Tony Lee,Alessandro Magnani,Ciya Liao

from arxiv, 8 pages, 3 figures, CIKM 2024

Embedding-based neural retrieval (EBR) is an effective search retrieval method in product search for tackling the vocabulary gap between customer search queries and products. The initial launch of our EBR system at Walmart yielded significant gains in relevance and add-to-cart rates [1]. However, despite EBR generally retrieving more relevant products for reranking, we have observed numerous instances of relevance degradation. Enhancing retrieval performance is crucial, as it directly influences product reranking and affects the customer shopping experience. Factors contributing to these degradations include false positives/negatives in the training data and the inability to handle query misspellings. To address these issues, we present several approaches to further strengthen the capabilities of our EBR model in terms of retrieval relevance. We introduce a Relevance Reward Model (RRM) based on human relevance feedback. We utilize RRM to remove noise from the training data and distill it into our EBR model through a multi-objective loss. In addition, we present the techniques to increase the performance of our EBR model, such as typo-aware training, and semi-positive generation. The effectiveness of our EBR is demonstrated through offline relevance evaluation, online AB tests, and successful deployments to live production. [1] Alessandro Magnani, Feng Liu, Suthee Chaidaroon, Sachin Yadav, Praveen Reddy Suram, Ajit Puthenputhussery, Sijie Chen, Min Xie, Anirudh Kashi, Tony Lee, et al. 2022. Semantic retrieval at walmart. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 3495-3503.

翻译：基于嵌入的神经检索（EBR）是产品搜索中一种有效的检索方法，用于解决客户搜索查询与产品之间的词汇鸿沟问题。我们在沃尔玛首次推出的EBR系统在相关性和加购率方面取得了显著提升[1]。然而，尽管EBR通常能为重排序检索到更相关的产品，我们仍观察到许多相关性下降的实例。提升检索性能至关重要，因为它直接影响产品重排序并影响客户购物体验。导致这些下降的因素包括训练数据中的假阳性/假阴性，以及处理查询拼写错误的能力不足。为解决这些问题，我们提出了几种方法来进一步增强我们EBR模型在检索相关性方面的能力。我们引入了一个基于人工相关性反馈的相关性奖励模型（RRM）。我们利用RRM去除训练数据中的噪声，并通过多目标损失将其知识蒸馏到我们的EBR模型中。此外，我们提出了提高EBR模型性能的技术，例如拼写错误感知训练和半正样本生成。我们EBR的有效性通过离线相关性评估、在线AB测试以及成功部署到实际生产环境得到了验证。[1] Alessandro Magnani, Feng Liu, Suthee Chaidaroon, Sachin Yadav, Praveen Reddy Suram, Ajit Puthenputhussery, Sijie Chen, Min Xie, Anirudh Kashi, Tony Lee, et al. 2022. Semantic retrieval at walmart. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 3495-3503.

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日