Pre-search query recommendation, widely known as HintQ on Taobao's homepage, plays a vital role in intent capture and demand discovery, yet traditional methods suffer from shallow semantics, poor cold-start performance and low serendipity due to reliance on ID-based matching and co-click heuristics. To overcome these challenges, we propose AIGQ (AI-Generated Query architecture), the first end-to-end generative framework for HintQ scenario. AIGQ is built upon three core innovations spanning training paradigm, policy optimization and deployment architecture. First, we propose Interest-Aware List Supervised Fine-Tuning (IL-SFT), a list-level supervised learning approach that constructs training samples through session-aware behavior aggregation and interest-guided re-ranking strategy to faithfully model nuanced user intent. Accordingly, we design Interest-aware List Group Relative Policy Optimization (IL-GRPO), a novel policy gradient algorithm with a dual-component reward mechanism that jointly optimizes individual query relevance and global list properties, enhanced by a model-based reward from the online click-through rate (CTR) ranking model. To deploy under strict real-time and low-latency requirements, we further develop a hybrid offline-online architecture comprising AIGQ-Direct for nearline personalized user-to-query generation and AIGQ-Think, a reasoning-enhanced variant that produces trigger-to-query mappings to enrich interest diversity. Extensive offline evaluations and large-scale online A/B experiments on Taobao demonstrate that AIGQ consistently delivers substantial improvements in key business metrics across platform effectiveness and user engagement.
翻译:搜索前查询推荐(在淘宝首页广泛被称为HintQ)在意图捕获和需求发现中起着至关重要的作用,然而传统方法由于依赖基于ID的匹配和共点击启发式规则,存在语义浅层、冷启动性能差和惊喜度低的问题。为克服这些挑战,我们提出了AIGQ(AI生成查询架构),这是首个面向HintQ场景的端到端生成式框架。AIGQ建立在训练范式、策略优化和部署架构三项核心创新之上。首先,我们提出了兴趣感知列表监督微调(IL-SFT),这是一种列表级监督学习方法,通过会话感知行为聚合和兴趣引导的重排序策略构建训练样本,以准确建模细微的用户意图。相应地,我们设计了兴趣感知列表组相对策略优化(IL-GRPO),这是一种新颖的策略梯度算法,采用双组件奖励机制联合优化单个查询相关性和全局列表属性,并借助基于在线点击率(CTR)排序模型的模型奖励进行增强。为了在严格的实时和低延迟要求下部署,我们进一步开发了一种混合离线-在线架构,包括用于近线个性化用户到查询生成的AIGQ-Direct,以及一种推理增强变体AIGQ-Think,用于生成触发到查询的映射以丰富兴趣多样性。在淘宝上的大量离线评估和大规模在线A/B实验表明,AIGQ在平台有效性和用户参与度等关键业务指标上持续带来显著改进。