Large language model-based agents are increasingly used in recommender systems (Agent4RSs) to achieve personalized behavior modeling. Specifically, Agent4RSs introduces memory mechanisms that enable the agents to autonomously learn and self-evolve from real-world interactions. However, to the best of our knowledge, how robust Agent4RSs are remains unexplored. As such, in this paper, we propose the first work to attack Agent4RSs by perturbing agents' memories, not only to uncover their limitations but also to enhance their security and robustness, ensuring the development of safer and more reliable AI agents. Given the security and privacy concerns, it is more practical to launch attacks under a black-box setting, where the accurate knowledge of the victim models cannot be easily obtained. Moreover, the practical attacks are often stealthy to maximize the impact. To this end, we propose a novel practical attack framework named DrunkAgent. DrunkAgent consists of a generation module, a strategy module, and a surrogate module. The generation module aims to produce effective and coherent adversarial textual triggers, which can be used to achieve attack objectives such as promoting the target items. The strategy module is designed to `get the target agents drunk' so that their memories cannot be effectively updated during the interaction process. As such, the triggers can play the best role. Both of the modules are optimized on the surrogate module to improve the transferability and imperceptibility of the attacks. By identifying and analyzing the vulnerabilities, our work provides critical insights that pave the way for building safer and more resilient Agent4RSs. Extensive experiments across various real-world datasets demonstrate the effectiveness of DrunkAgent.
翻译:基于大语言模型的智能体在推荐系统(Agent4RSs)中的应用日益广泛,以实现个性化行为建模。具体而言,Agent4RSs引入了记忆机制,使智能体能够从现实世界交互中自主学习和自我进化。然而,据我们所知,Agent4RSs的鲁棒性如何仍未得到探索。因此,在本文中,我们提出了首个通过扰动智能体记忆来攻击Agent4RSs的工作,不仅旨在揭示其局限性,也旨在增强其安全性和鲁棒性,以确保开发更安全、更可靠的人工智能智能体。考虑到安全和隐私问题,在黑盒设置下发起攻击更为实际,因为在这种设置下难以轻易获取受害模型的准确知识。此外,实际攻击通常具有隐蔽性以最大化影响。为此,我们提出了一种名为DrunkAgent的新型实用攻击框架。DrunkAgent由生成模块、策略模块和代理模块组成。生成模块旨在产生有效且连贯的对抗性文本触发器,这些触发器可用于实现攻击目标,例如推广目标项目。策略模块旨在“让目标智能体喝醉”,使其在交互过程中无法有效更新记忆。这样,触发器可以发挥最佳作用。这两个模块都在代理模块上进行优化,以提高攻击的可迁移性和不可感知性。通过识别和分析这些漏洞,我们的工作为构建更安全、更具韧性的Agent4RSs提供了关键见解并铺平了道路。在多个真实世界数据集上进行的大量实验证明了DrunkAgent的有效性。