Improving Retrieval in Sponsored Search by Leveraging Query Context Signals

Accurately retrieving relevant bid keywords for user queries is critical in Sponsored Search but remains challenging, particularly for short, ambiguous queries. Existing dense and generative retrieval models often fail to capture nuanced user intent in these cases. To address this, we propose an approach to enhance query understanding by augmenting queries with rich contextual signals derived from web search results and large language models, stored in an online cache. Specifically, we use web search titles and snippets to ground queries in real-world information and utilize GPT-4 to generate query rewrites and explanations that clarify user intent. These signals are efficiently integrated through a Fusion-in-Decoder based Unity architecture, enabling both dense and generative retrieval with serving costs on par with traditional context-free models. To address scenarios where context is unavailable in the cache, we introduce context glancing, a curriculum learning strategy that improves model robustness and performance even without contextual signals during inference. Extensive offline experiments demonstrate that our context-aware approach substantially outperforms context-free models. Furthermore, online A/B testing on a prominent search engine across 160+ countries shows significant improvements in user engagement and revenue.

翻译：在赞助搜索中，为用户查询准确检索相关的竞价关键词至关重要，但这仍然具有挑战性，特别是对于简短、模糊的查询。现有的稠密检索和生成式检索模型通常难以捕捉此类情况下细微的用户意图。为解决此问题，我们提出一种方法，通过利用源自网络搜索结果和大语言模型的丰富上下文信号来增强查询理解，这些信号存储于在线缓存中。具体而言，我们使用网络搜索标题和摘要片段将查询锚定在现实世界信息中，并利用 GPT-4 生成查询重写和解释以澄清用户意图。这些信号通过一种基于 Fusion-in-Decoder 的 Unity 架构高效集成，实现了稠密检索和生成式检索，且服务成本与传统无上下文模型相当。针对缓存中上下文不可用的场景，我们引入了上下文瞥视，这是一种课程学习策略，即使在推理时没有上下文信号的情况下，也能提高模型的鲁棒性和性能。大量的离线实验表明，我们的上下文感知方法显著优于无上下文模型。此外，在一个覆盖 160 多个国家的知名搜索引擎上进行的在线 A/B 测试显示，用户参与度和收入均有显著提升。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/