Recommender systems are the cornerstone of today's information dissemination, yet a disconnect between offline metrics and online performance greatly hinders their development. Addressing this challenge, we envision a recommendation simulator, capitalizing on recent breakthroughs in human-level intelligence exhibited by Large Language Models (LLMs). We propose Agent4Rec, a novel movie recommendation simulator, leveraging LLM-empowered generative agents equipped with user profile, memory, and actions modules specifically tailored for the recommender system. In particular, these agents' profile modules are initialized using the MovieLens dataset, capturing users' unique tastes and social traits; memory modules log both factual and emotional memories and are integrated with an emotion-driven reflection mechanism; action modules support a wide variety of behaviors, spanning both taste-driven and emotion-driven actions. Each agent interacts with personalized movie recommendations in a page-by-page manner, relying on a pre-implemented collaborative filtering-based recommendation algorithm. We delve into both the capabilities and limitations of Agent4Rec, aiming to explore an essential research question: to what extent can LLM-empowered generative agents faithfully simulate the behavior of real, autonomous humans in recommender systems? Extensive and multi-faceted evaluations of Agent4Rec highlight both the alignment and deviation between agents and user-personalized preferences. Beyond mere performance comparison, we explore insightful experiments, such as emulating the filter bubble effect and discovering the underlying causal relationships in recommendation tasks. Our codes are available at https://github.com/LehengTHU/Agent4Rec.
翻译:推荐系统是当今信息传播的基石,然而离线指标与在线性能之间的脱节严重制约了其发展。针对这一挑战,我们设想利用大语言模型在类人智能方面的最新突破构建推荐模拟器。我们提出Agent4Rec——一个新颖的电影推荐模拟器,采用经大语言模型赋能的生成式智能体,并为其量身定制了面向推荐系统的用户画像、记忆与动作模块。具体而言,这些智能体的画像模块通过MovieLens数据集初始化,以捕捉用户的独特品味与社会特征;记忆模块记录事实记忆与情感记忆,并集成情感驱动的反思机制;动作模块支持从品味驱动到情感驱动的多样化行为。每个智能体以逐页方式与个性化电影推荐交互,依赖预实现的协同过滤推荐算法。我们从能力与局限性两个维度深入探究Agent4Rec,旨在回答一个关键研究问题:大语言模型赋能的生成式智能体能在多大程度上忠实模拟推荐系统中真实自主人类的行为?对Agent4Rec广泛且多维的评估揭示了智能体与用户个性化偏好之间的对齐与偏差。除性能对比外,我们还开展了富有洞察力的实验,例如模拟过滤气泡效应并发现推荐任务中的潜在因果关系。我们的代码开源在https://github.com/LehengTHU/Agent4Rec。