Video recommender systems are among the most popular and impactful applications of AI, shaping content consumption and influencing culture for billions of users. Traditional single-model recommenders, which optimize static engagement metrics, are increasingly limited in addressing the dynamic requirements of modern platforms. In response, multi-agent architectures are redefining how video recommender systems serve, learn, and adapt to both users and datasets. These agent-based systems coordinate specialized agents responsible for video understanding, reasoning, memory, and feedback, to provide precise, explainable recommendations. In this survey, we trace the evolution of multi-agent video recommendation systems (MAVRS). We combine ideas from multi-agent recommender systems, foundation models, and conversational AI, culminating in the emerging field of large language model (LLM)-powered MAVRS. We present a taxonomy of collaborative patterns and analyze coordination mechanisms across diverse video domains, ranging from short-form clips to educational platforms. We discuss representative frameworks, including early multi-agent reinforcement learning (MARL) systems such as MMRF and recent LLM-driven architectures like MACRec and Agent4Rec, to illustrate these patterns. We also outline open challenges in scalability, multimodal understanding, incentive alignment, and identify research directions such as hybrid reinforcement learning-LLM systems, lifelong personalization and self-improving recommender systems.
翻译:摘要:视频推荐系统是人工智能中最流行且最具影响力的应用之一,它们塑造着数十亿用户的内容消费并影响文化。传统的单一模型推荐器优化静态参与度指标,越来越难以满足现代平台的动态需求。为此,多智能体架构正在重新定义视频推荐系统的服务、学习及对用户和数据集的适应方式。这些基于智能体的系统协调负责视频理解、推理、记忆和反馈的专业智能体,以提供精准、可解释的推荐。在本综述中,我们追溯了多智能体视频推荐系统(MAVRS)的演进历程,融合了多智能体推荐系统、基础模型和对话式AI的思想,最终聚焦于大型语言模型(LLM)驱动的MAVRS这一新兴领域。我们提出了协作模式的分类体系,分析了从短视频片段到教育平台等多种视频领域中的协调机制,并讨论了代表性框架,包括早期多智能体强化学习(MARL)系统(如MMRF)和最近的LLM驱动架构(如MACRec和Agent4Rec),以阐明这些模式。此外,我们概述了可扩展性、多模态理解、激励对齐等方面的开放挑战,并指出了未来研究方向,例如混合强化学习-LLM系统、终身个性化推荐及自我改进推荐系统。