Large language model (LLM)-powered agents are increasingly used in recommender systems (RSs) to achieve personalized behavior modeling, where the memory mechanism plays a pivotal role in enabling the agents to autonomously explore, learn and self-evolve from real-world interactions. However, this very mechanism, serving as a contextual repository, inherently exposes an attack surface for potential adversarial manipulations. Despite its central role, the robustness of agentic RSs in the face of such threats remains largely underexplored. Previous works suffer from semantic mismatches or rely on static embeddings or pre-defined prompts, all of which are not designed for dynamic systems, especially for dynamic memory states of LLM agents. This challenge is exacerbated by the black-box nature of commercial recommenders. To tackle the above problems, in this paper, we present the first systematic investigation of memory-based vulnerabilities in LLM-powered recommender agents, revealing their security limitations and guiding efforts to strengthen system resilience and trustworthiness. Specifically, we propose a novel black-box attack framework named DrunkAgent. DrunkAgent crafts semantically meaningful adversarial textual triggers for target item promotions and introduces a series of strategies to maximize the trigger effect by corrupting the memory updates during the interactions. The triggers and strategies are optimized on a surrogate model, enabling DrunkAgent transferable and stealthy. Extensive experiments on real-world datasets across diverse agentic RSs, including collaborative filtering, retrieval augmentation and sequential recommendations, demonstrate the generalizability, transferability and stealthiness of DrunkAgent.
翻译:大型语言模型(LLM)驱动的代理在推荐系统(RS)中的应用日益广泛,以实现个性化行为建模。其中,记忆机制在使代理能够从现实世界交互中自主探索、学习和自我演化方面起着关键作用。然而,这一作为上下文存储库的机制,本质上暴露了潜在对抗性操纵的攻击面。尽管其作用至关重要,但面对此类威胁时,代理式推荐系统的鲁棒性在很大程度上仍未得到充分探索。先前的研究存在语义不匹配问题,或依赖于静态嵌入或预定义提示,这些方法均非为动态系统设计,尤其不适用于LLM代理的动态记忆状态。商业推荐系统的黑盒性质加剧了这一挑战。为解决上述问题,本文首次对LLM驱动的推荐代理中基于内存的漏洞进行了系统性研究,揭示了其安全局限性,并为增强系统韧性和可信度提供了指导。具体而言,我们提出了一种名为DrunkAgent的新型黑盒攻击框架。DrunkAgent为目标项目推广生成具有语义意义的对抗性文本触发器,并通过在交互过程中破坏内存更新引入了一系列策略以最大化触发效果。触发器和策略在代理模型上进行优化,使得DrunkAgent具备可迁移性和隐蔽性。在包括协同过滤、检索增强和序列推荐在内的多种代理式推荐系统上,基于真实数据集的广泛实验验证了DrunkAgent的泛化性、可迁移性和隐蔽性。