Large-scale industrial recommenders typically use a fixed multi-stage pipeline (recall, ranking, re-ranking) and have progressed from collaborative filtering to deep and large pre-trained models. However, both multi-stage and so-called One Model designs remain essentially static: models are black boxes, and system improvement relies on manual hypotheses and engineering, which is hard to scale under heterogeneous data and multi-objective business constraints. We propose an Agentic Recommender System (AgenticRS) that reorganizes key modules as agents. Modules are promoted to agents only when they form a functionally closed loop, can be independently evaluated, and possess an evolvable decision space. For model agents, we outline two self-evolution mechanisms: reinforcement learning style optimization in well-defined action spaces, and large language model based generation and selection of new architectures and training schemes in open-ended design spaces. We further distinguish individual evolution of single agents from compositional evolution over how multiple agents are selected and connected, and use a layered inner and outer reward design to couple local optimization with global objectives. This provides a concise blueprint for turning static pipelines into self-evolving agentic recommender systems.
翻译:大规模工业级推荐系统通常采用固定的多阶段流水线(召回、排序、重排序),其发展路径已从协同过滤演进至深度模型与大型预训练模型。然而,无论是多阶段架构还是所谓的“单模型”设计,本质上仍属静态系统:模型作为黑箱运作,系统优化依赖人工假设与工程手段,难以在异构数据与多目标业务约束下实现规模化扩展。我们提出智能体推荐系统,将关键模块重组为智能体。模块升格为智能体的条件为:形成功能闭环、可独立评估、具备可演化决策空间。针对模型智能体,我们设计了两种自我演化机制:在明确动作空间中采用强化学习式优化,以及在开放式设计空间中基于大语言模型生成并选择新架构与训练方案。我们进一步区分了单智能体的独立演化与多智能体组合演化(涉及智能体选择与连接方式),并采用分层内外奖励设计将局部优化与全局目标耦合。这为将静态流水线转化为可自我演化的智能体推荐系统提供了简洁蓝图。