Large Language Models (LLMs) are increasingly augmented with external tools and commercial services into LLM-integrated systems. While these interfaces can significantly enhance the capabilities of the models, they also introduce a new attack surface. Manipulated integrations, for example, can exploit the model and compromise sensitive data accessed through other interfaces. While previous work primarily focused on attacks targeting a model's alignment or the leakage of training data, the security of data that is only available during inference has escaped scrutiny so far. In this work, we demonstrate the vulnerabilities associated with external components and introduce a systematic approach to evaluate confidentiality risks in LLM-integrated systems. We identify two specific attack scenarios unique to these systems and formalize these into a tool-robustness framework designed to measure a model's ability to protect sensitive information. Our findings show that all examined models are highly vulnerable to confidentiality attacks, with the risk increasing significantly when models are used together with external tools.
翻译:大型语言模型(LLM)正日益通过与外部工具和商业服务集成,形成LLM集成系统。虽然这些接口能显著增强模型的能力,但也引入了新的攻击面。例如,被操纵的集成可利用模型,并通过其他接口泄露敏感数据。以往的研究主要关注针对模型对齐性或训练数据泄露的攻击,而仅在推理过程中可访问的数据安全性至今尚未得到充分审视。本研究揭示了与外部组件相关的安全漏洞,并提出了一种系统性方法来评估LLM集成系统中的机密性风险。我们识别了这类系统特有的两种具体攻击场景,并将其形式化为一个工具鲁棒性框架,旨在衡量模型保护敏感信息的能力。研究结果表明,所有被检测的模型均高度易受机密性攻击,且当模型与外部工具结合使用时,风险显著增加。