Large Language Models (LLMs) have revolutionized recommendation agents by providing superior reasoning and flexible decision-making capabilities. However, existing methods mainly follow a passive information acquisition paradigm, where agents either rely on static pre-defined workflows or perform reasoning with constrained information. It limits the agent's ability to identify information sufficiency, often leading to suboptimal recommendations when faced with fragmented user profiles or sparse item metadata. To address these limitations, we propose RecThinker, an agentic framework for tool-augmented reasoning in recommendation, which shifts recommendation from passive processing to autonomous investigation by dynamically planning reasoning paths and proactively acquiring essential information via autonomous tool-use. Specifically, RecThinker adopts an Analyze-Plan-Act paradigm, which first analyzes the sufficiency of user-item information and autonomously invokes tool-calling sequences to bridge information gaps between available knowledge and reasoning requirements. We develop a suite of specialized tools for RecThinker, enabling the model to acquire user-side, item-side, and collaborative information for better reasoning and user-item matching. Furthermore, we introduce a self-augmented training pipeline, comprising a Supervised Fine-Tuning (SFT) stage to internalize high-quality reasoning trajectories and a Reinforcement Learning (RL) stage to optimize for decision accuracy and tool-use efficiency. Extensive experiments on multiple benchmark datasets demonstrate that RecThinker consistently outperforms strong baselines in the recommendation scenario.
翻译:大型语言模型(LLM)通过提供卓越的推理能力和灵活的决策能力,彻底改变了推荐智能体。然而,现有方法主要遵循被动的信息获取范式,智能体要么依赖静态预定义的工作流,要么在受限信息下进行推理。这限制了智能体判断信息充分性的能力,在面对碎片化的用户画像或稀疏的物品元数据时,往往导致次优推荐。为解决这些局限,我们提出RecThinker——一种面向推荐系统中工具增强推理的智能体框架,该框架通过动态规划推理路径并借助自主工具使用主动获取关键信息,将推荐从被动处理转向自主探索。具体而言,RecThinker采用“分析-规划-执行”范式,首先分析用户-物品信息的充分性,随后自主调用工具调用序列,以弥合现有知识与推理需求之间的信息鸿沟。我们为RecThinker开发了一套专用工具集,使其能够获取用户侧、物品侧以及协同信息,从而实现更优的推理与用户-物品匹配。此外,我们引入了一种自增强训练流程,包含监督微调(SFT)阶段以内化高质量推理轨迹,以及强化学习(RL)阶段以优化决策准确性与工具使用效率。在多个基准数据集上的大量实验表明,RecThinker在推荐场景中持续优于现有强基线方法。