Search-based recommendation is one of the most critical application scenarios in e-commerce platforms. Users' complex search contexts--such as spatiotemporal factors, historical interactions, and current query's information--constitute an essential part of their decision-making, reflecting implicit preferences that complement explicit query terms. Modeling such rich contextual signals and their intricate associations with candidate items remains a key challenge. Although numerous efforts have been devoted to building more effective search methods, existing approaches still show limitations in integrating contextual information, which hinders their ability to fully capture user intent. To address these challenges, we propose a context-aware reasoning-enhanced generative search framework for better \textbf{understanding the complicated context}. Specifically, the framework first unifies heterogeneous user and item contexts into textual representations or text-based semantic identifiers and aligns them. To overcome the lack of explicit reasoning trajectories, we introduce a self-evolving post-training paradigm that iteratively combines supervised fine-tuning and reinforcement learning to progressively enhance the model's reasoning capability. In addition, we identify potential biases in existing RL algorithms when applied to search scenarios and present a debiased variant of GRPO to improve ranking performance. Extensive experiments on search log data collected from a real-world e-commerce platform demonstrate that our approach achieves superior performance compared with strong baselines, validating its effectiveness for search-based recommendation.
翻译:基于搜索的推荐是电商平台最关键的應用场景之一。用户的复杂搜索上下文——如时空因素、历史交互和当前查询信息——构成其决策过程的重要组成部分,反映了对显式查询词形成补充的隐式偏好。对此类丰富的上下文信号及其与候选商品之间复杂关联进行建模,仍是一个关键挑战。尽管已有大量研究致力于构建更有效的搜索方法,现有方法在整合上下文信息方面仍存在局限,阻碍了其全面捕捉用户意图的能力。为应对这些挑战,我们提出了一种上下文感知的推理增强型生成式搜索框架,以更好地**理解复杂上下文**。具体而言,该框架首先将异构的用户与商品上下文统一为文本表示或基于文本的语义标识符并进行对齐。为克服显式推理轨迹的缺失,我们引入了一种自我演进的后续训练范式,通过迭代结合监督微调与强化学习来逐步增强模型的推理能力。此外,我们识别出现有强化学习算法应用于搜索场景时存在的潜在偏差,并提出一种去偏的GRPO变体以提升排序性能。在从真实电商平台收集的搜索日志数据上进行的大量实验表明,我们的方法相较于强基线模型取得了更优的性能,验证了其在基于搜索的推荐任务中的有效性。