Embedding-based neural retrieval is a prevalent approach to address the semantic gap problem which often arises in product search on tail queries. In contrast, popular queries typically lack context and have a broad intent where additional context from users historical interaction can be helpful. In this paper, we share our novel approach to address both: the semantic gap problem followed by an end to end trained model for personalized semantic retrieval. We propose learning a unified embedding model incorporating graph, transformer and term-based embeddings end to end and share our design choices for optimal tradeoff between performance and efficiency. We share our learnings in feature engineering, hard negative sampling strategy, and application of transformer model, including a novel pre-training strategy and other tricks for improving search relevance and deploying such a model at industry scale. Our personalized retrieval model significantly improves the overall search experience, as measured by a 5.58% increase in search purchase rate and a 2.63% increase in site-wide conversion rate, aggregated across multiple A/B tests - on live traffic.
翻译:基于嵌入的神经检索是解决语义鸿沟问题的主流方法,该问题在尾部查询的产品搜索中尤为常见。相比之下,热门查询通常缺乏上下文且意图广泛,此时用户历史交互的额外语境信息可发挥重要作用。本文提出了一种新颖方法同时解决这两类问题:首先处理语义鸿沟,进而通过端到端训练的模型实现个性化语义检索。我们提出联合图嵌入、Transformer和词嵌入构建统一嵌入模型的端到端方案,并分享了在性能与效率最优权衡下的设计选择。我们阐述了特征工程、困难负样本采样策略及Transformer模型应用(包括创新的预训练策略和提升搜索相关性的其他技巧)中的经验,以及如何在工业规模部署此类模型。根据多个A/B测试(在真实流量环境下)的聚合结果,我们的个性化检索模型显著提升了整体搜索体验:搜索购买率提升5.58%,全站转化率提升2.63%。