Large Language Models (LLMs) are reshaping recommender systems by leveraging extensive world knowledge and semantic reasoning to interpret user intent. However, effectively integrating these capabilities with collaborative signals while avoiding prohibitive inference latency remains a critical bottleneck. To address this, we propose a trajectory-driven internalization framework to develop a Single-agent Trajectory-Aligned Recommender (STAR). Specifically, to internalize complex reasoning capabilities into a single efficient model, we first design a multi-agent teacher system capable of multi-turn tool usage and reflection. This teacher utilizes a Collaborative Signal Translation mechanism to explicitly convert latent behavioral patterns into descriptive natural language evidence to enhance reasoning accuracy. Subsequently, a trajectory-driven distillation pipeline transfers this agentic logic, including planning, tool usage, and self-reflection, into the compact STAR model. Extensive experiments demonstrate that STAR surpasses its teacher by 8.7% to 39.5% while eliminating iterative latency, paving the way for real-time, reasoning-enhanced recommendation.
翻译:大语言模型(LLMs)正通过利用广泛的世界知识和语义推理来解读用户意图,从而重塑推荐系统。然而,如何有效整合这些能力与协同信号,同时避免过高的推理延迟,仍然是一个关键瓶颈。为解决此问题,我们提出了一种轨迹驱动的内化框架,以开发一个单智能体轨迹对齐推荐器(STAR)。具体而言,为了将复杂的推理能力内化到一个单一的高效模型中,我们首先设计了一个能够进行多轮工具使用与反思的多智能体教师系统。该教师系统利用协同信号翻译机制,将潜在的行为模式显式地转换为描述性的自然语言证据,以提升推理准确性。随后,一个轨迹驱动的蒸馏流程将此智能体逻辑——包括规划、工具使用和自我反思——迁移至紧凑的STAR模型中。大量实验表明,STAR在消除迭代延迟的同时,其性能超越了其教师模型8.7%至39.5%,为实时、推理增强的推荐系统开辟了道路。