Query rewriting (QR) is a critical technique in e-commerce search, addressing the lexical gap between user queries and product descriptions to enhance search performance. Existing QR approaches typically fall into two categories: discriminative models and generative methods leveraging large language models (LLMs). Discriminative models often struggle with natural language understanding and offer limited flexibility in rewriting, while generative LLMs, despite producing high-quality rewrites, face high inference latency and cost in online settings. These limitations force offline deployment, making them vulnerable to issues like information staleness and semantic drift. To overcome these challenges, we propose a novel hybrid pipeline for QR that balances efficiency and effectiveness. Our approach combines offline knowledge distillation to create a lightweight but efficient student model with online reinforcement learning (RL) to refine query rewriting dynamically using real-time feedback. A key innovation is the use of LLMs as simulated human feedback, enabling scalable reward signals and cost-effective evaluation without manual annotations. Experimental results on Amazon ESCI dataset demonstrate significant improvements in query relevance, diversity, and adaptability, as well as positive feedback from the LLM simulation. This work contributes to advancing LLM capabilities for domain-specific applications, offering a robust solution for dynamic and complex e-commerce search environments.
翻译:查询重写是电商搜索中的关键技术,旨在弥合用户查询与商品描述之间的词汇鸿沟,从而提升搜索性能。现有的查询重写方法主要分为两类:判别式模型和基于大语言模型的生成式方法。判别式模型通常在自然语言理解方面存在局限,且重写灵活性不足;而生成式大语言模型虽能产生高质量的重写结果,但在在线场景下面临高推理延迟和高成本的挑战。这些限制迫使其只能离线部署,从而易受信息陈旧和语义漂移等问题的影响。为克服这些挑战,我们提出了一种新颖的混合式查询重写流程,以平衡效率与效果。该方法结合了离线知识蒸馏以构建轻量高效的学生模型,并利用在线强化学习,通过实时反馈动态优化查询重写过程。一个关键创新在于使用大语言模型模拟人工反馈,从而提供可扩展的奖励信号和无需人工标注的高性价比评估。在Amazon ESCI数据集上的实验结果表明,该方法在查询相关性、多样性和适应性方面均有显著提升,并获得了大语言模型模拟的积极反馈。本工作有助于推进大语言模型在特定领域应用中的能力,为动态复杂的电商搜索环境提供了一个鲁棒的解决方案。