AIGQ: An End-to-End Hybrid Generative Architecture for E-commerce Query Recommendation

Pre-search query recommendation, widely known as HintQ on Taobao's homepage, plays a vital role in intent capture and demand discovery, yet traditional methods suffer from shallow semantics, poor cold-start performance and low serendipity due to reliance on ID-based matching and co-click heuristics. To overcome these challenges, we propose AIGQ (AI-Generated Query architecture), the first end-to-end generative framework for HintQ scenario. AIGQ is built upon three core innovations spanning training paradigm, policy optimization and deployment architecture. First, we propose Interest-Aware List Supervised Fine-Tuning (IL-SFT), a list-level supervised learning approach that constructs training samples through session-aware behavior aggregation and interest-guided re-ranking strategy to faithfully model nuanced user intent. Accordingly, we design Interest-aware List Group Relative Policy Optimization (IL-GRPO), a novel policy gradient algorithm with a dual-component reward mechanism that jointly optimizes individual query relevance and global list properties, enhanced by a model-based reward from the online click-through rate (CTR) ranking model. To deploy under strict real-time and low-latency requirements, we further develop a hybrid offline-online architecture comprising AIGQ-Direct for nearline personalized user-to-query generation and AIGQ-Think, a reasoning-enhanced variant that produces trigger-to-query mappings to enrich interest diversity. Extensive offline evaluations and large-scale online A/B experiments on Taobao demonstrate that AIGQ consistently delivers substantial improvements in key business metrics across platform effectiveness and user engagement.

翻译：搜索前查询推荐（在淘宝首页广泛被称为HintQ）在意图捕获和需求发现中起着至关重要的作用，然而传统方法由于依赖基于ID的匹配和共点击启发式规则，存在语义浅层、冷启动性能差和惊喜度低的问题。为克服这些挑战，我们提出了AIGQ（AI生成查询架构），这是首个面向HintQ场景的端到端生成式框架。AIGQ建立在训练范式、策略优化和部署架构三项核心创新之上。首先，我们提出了兴趣感知列表监督微调（IL-SFT），这是一种列表级监督学习方法，通过会话感知行为聚合和兴趣引导的重排序策略构建训练样本，以准确建模细微的用户意图。相应地，我们设计了兴趣感知列表组相对策略优化（IL-GRPO），这是一种新颖的策略梯度算法，采用双组件奖励机制联合优化单个查询相关性和全局列表属性，并借助基于在线点击率（CTR）排序模型的模型奖励进行增强。为了在严格的实时和低延迟要求下部署，我们进一步开发了一种混合离线-在线架构，包括用于近线个性化用户到查询生成的AIGQ-Direct，以及一种推理增强变体AIGQ-Think，用于生成触发到查询的映射以丰富兴趣多样性。在淘宝上的大量离线评估和大规模在线A/B实验表明，AIGQ在平台有效性和用户参与度等关键业务指标上持续带来显著改进。