AgentCF: Collaborative Learning with Autonomous Language Agents for Recommender Systems

Recently, there has been an emergence of employing LLM-powered agents as believable human proxies, based on their remarkable decision-making capability. However, existing studies mainly focus on simulating human dialogue. Human non-verbal behaviors, such as item clicking in recommender systems, although implicitly exhibiting user preferences and could enhance the modeling of users, have not been deeply explored. The main reasons lie in the gap between language modeling and behavior modeling, as well as the incomprehension of LLMs about user-item relations. To address this issue, we propose AgentCF for simulating user-item interactions in recommender systems through agent-based collaborative filtering. We creatively consider not only users but also items as agents, and develop a collaborative learning approach that optimizes both kinds of agents together. Specifically, at each time step, we first prompt the user and item agents to interact autonomously. Then, based on the disparities between the agents' decisions and real-world interaction records, user and item agents are prompted to reflect on and adjust the misleading simulations collaboratively, thereby modeling their two-sided relations. The optimized agents can also propagate their preferences to other agents in subsequent interactions, implicitly capturing the collaborative filtering idea. Overall, the optimized agents exhibit diverse interaction behaviors within our framework, including user-item, user-user, item-item, and collective interactions. The results show that these agents can demonstrate personalized behaviors akin to those of real-world individuals, sparking the development of next-generation user behavior simulation.

翻译：近期，基于大语言模型（LLM）驱动的代理因其卓越的决策能力，已被用作可信的人类代理。然而，现有研究主要集中于模拟人类对话。人类非语言行为（如推荐系统中的点击行为）虽能隐式展现用户偏好并增强用户建模，但尚未得到深入探索。其主要原因在于语言建模与行为建模之间的鸿沟，以及大语言模型对用户-物品关系的理解不足。为解决此问题，我们提出AgentCF，通过基于代理的协同过滤模拟推荐系统中的用户-物品交互。我们创新性地将用户和物品均视为代理，并开发了一种协同学习方法，联合优化这两类代理。具体而言，在每个时间步，我们首先引导用户代理和物品代理自主交互。随后，基于代理决策与现实交互记录的差异，协同引导用户和物品代理反思并调整误导性模拟，从而建模其双向关系。优化后的代理还能在后续交互中将偏好传播至其他代理，隐式实现协同过滤思想。总体而言，优化后的代理在我们的框架中展现出多样化的交互行为，包括用户-物品、用户-用户、物品-物品及集体交互。结果表明，这些代理能表现出与真实个体相似的个性化行为，为下一代用户行为模拟的发展提供了启示。