Misusing Tools in Large Language Models With Visual Adversarial Examples

Large Language Models (LLMs) are being enhanced with the ability to use tools and to process multiple modalities. These new capabilities bring new benefits and also new security risks. In this work, we show that an attacker can use visual adversarial examples to cause attacker-desired tool usage. For example, the attacker could cause a victim LLM to delete calendar events, leak private conversations and book hotels. Different from prior work, our attacks can affect the confidentiality and integrity of user resources connected to the LLM while being stealthy and generalizable to multiple input prompts. We construct these attacks using gradient-based adversarial training and characterize performance along multiple dimensions. We find that our adversarial images can manipulate the LLM to invoke tools following real-world syntax almost always (~98%) while maintaining high similarity to clean images (~0.9 SSIM). Furthermore, using human scoring and automated metrics, we find that the attacks do not noticeably affect the conversation (and its semantics) between the user and the LLM.

翻译：大语言模型（LLMs）正通过集成工具调用能力与多模态处理能力得到增强，这些新功能在带来便利的同时也引入了新的安全风险。本研究表明，攻击者可通过视觉对抗样本诱使受害LLM执行其意图中的工具操作，例如删除日历事件、泄露隐私对话或预订酒店。与既有攻击相比，本方法可在隐蔽性和多输入提示泛化性方面，影响与LLM相连用户资源的机密性与完整性。我们采用基于梯度的对抗训练构建攻击，并在多个维度上评估其性能特征。实验发现，生成的对抗图像能操纵LLM以近似真实语法（成功率约98%）调用工具，同时保持与干净图像的高度相似性（SSIM约0.9）。通过人工评分与自动化指标评估，证实此类攻击不会显著影响用户与LLM之间的对话内容及其语义。

相关内容

TOOLS

关注 1

这个新版本的工具会议系列恢复了从1989年到2012年的50个会议的传统。工具最初是“面向对象语言和系统的技术”，后来发展到包括软件技术的所有创新方面。今天许多最重要的软件概念都是在这里首次引入的。2019年TOOLS 50+1在俄罗斯喀山附近举行，以同样的创新精神、对所有与软件相关的事物的热情、科学稳健性和行业适用性的结合以及欢迎该领域所有趋势和社区的开放态度，延续了该系列。官网链接：http://tools2019.innopolis.ru/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日