Recommender agents built on Large Language Models offer a promising paradigm for recommendation. However, existing recommender agents typically suffer from a disconnect between intermediate reasoning and final ranking feedback, and are unable to capture fine-grained preferences. To address this, we present AgenticRec, a ranking-oriented agentic recommendation framework that optimizes the entire decision-making trajectory (including intermediate reasoning, tool invocation, and final ranking list generation) under sparse implicit feedback. Our approach makes three key contributions. First, we design a suite of recommendation-specific tools integrated into a ReAct loop to support evidence-grounded reasoning. Second, we propose theoretically unbiased List-Wise Group Relative Policy Optimization (list-wise GRPO) to maximize ranking utility, ensuring accurate credit assignment for complex tool-use trajectories. Third, we introduce Progressive Preference Refinement (PPR) to resolve fine-grained preference ambiguities. By mining hard negatives from ranking violations and applying bidirectional preference alignment, PPR minimizes the convex upper bound of pairwise ranking errors. Experiments on benchmarks confirm that AgenticRec significantly outperforms baselines, validating the necessity of unifying reasoning, tool use, and ranking optimization.
翻译:基于大语言模型的推荐智能体为推荐系统提供了极具前景的范式。然而,现有推荐智能体通常存在中间推理与最终排序反馈脱节的问题,且无法捕捉细粒度偏好。为应对这一挑战,我们提出AgenticRec——一种面向排序的智能体推荐框架,可在稀疏隐式反馈下优化从中间推理、工具调用到最终排序列表生成的全决策轨迹。本文做出三项核心贡献:首先,我们设计了一套集成于ReAct循环中的专用推荐工具,以支持基于证据的推理;其次,提出理论上无偏的列表级组相对策略优化(list-wise GRPO),通过最大化排序效用实现复杂工具调用轨迹的精确信用分配;第三,引入渐进式偏好精炼(PPR)机制解决细粒度偏好歧义问题。通过从排序违规中挖掘难负样本并实施双向偏好对齐,PPR最小化了成对排序误差的凸上界。基准实验证实AgenticRec显著优于基线方法,验证了统一推理、工具使用与排序优化的必要性。