Large language model (LLM)-based agents combine LLMs with external tools to automate tasks such as scheduling meetings, managing documents, or booking travel. While these integrations unlock powerful capabilities, they also create new and more severe attack surfaces. In particular, prompt injection attacks become far more dangerous in the agentic setting: malicious instructions embedded in connected services can misdirect the agent, providing a direct pathway for sensitive data to be exfiltrated. Yet, despite a growing number of real-world incidents, the confidentiality risks of such systems remain poorly understood. To address this gap, we provide a formalization of confidentiality in LLM-based agents. By abstracting sensitive data as a secret string, we evaluate ten agents across 20 tool scenarios and 14 attack strategies. We find that all agents are vulnerable to at least one attack, and existing defenses fail to provide reliable protection against these threats. Strikingly, we find that the tooling itself can amplify leakage risks.
翻译:基于大语言模型(LLM)的智能体将大语言模型与外部工具相结合,以自动化执行诸如日程安排、文档管理或旅行预订等任务。尽管这些集成解锁了强大的功能,但也带来了新的、更严重的攻击面。特别是,提示注入攻击在智能体环境中变得更具危险性:嵌入在连接服务中的恶意指令可以误导智能体,从而为敏感数据泄露提供直接途径。然而,尽管真实世界事件日益增多,此类系统的机密性风险仍未被充分理解。为弥补这一空白,我们为基于大语言模型的智能体提供了机密性的形式化定义。通过将敏感数据抽象为秘密字符串,我们评估了10个智能体在20个工具场景和14种攻击策略下的表现。研究发现,所有智能体均易受至少一种攻击,而现有防御措施无法可靠抵御这些威胁。引人注目的是,我们发现工具本身可能放大数据泄露风险。