Generative large vision-language models (LVLMs) have recently achieved impressive performance gains, and their user base is growing rapidly. However, the security of LVLMs, in particular in a long-context multi-turn setting, is largely underexplored. In this paper, we consider the realistic scenario in which an attacker uploads a manipulated image to the web/social media. A benign user downloads this image and uses it as input to the LVLM. Our novel stealthy Visual Memory Injection (VMI) attack is designed such that on normal prompts the LVLM exhibits nominal behavior, but once the user gives a triggering prompt, the LVLM outputs a specific prescribed target message to manipulate the user, e.g. for adversarial marketing or political persuasion. Compared to previous work that focused on single-turn attacks, VMI is effective even after a long multi-turn conversation with the user. We demonstrate our attack on several recent open-weight LVLMs. This article thereby shows that large-scale manipulation of users is feasible with perturbed images in multi-turn conversation settings, calling for better robustness of LVLMs against these attacks. We release the source code at https://github.com/chs20/visual-memory-injection
翻译:生成式大型视觉语言模型(LVLM)近期取得了显著的性能提升,其用户基数正在快速增长。然而,LVLM的安全性,尤其是在长上下文多轮对话场景下的安全性,目前尚未得到充分探索。本文考虑了一种现实攻击场景:攻击者将一张经过篡改的图像上传至网络或社交媒体,随后良性用户下载该图像并将其作为LVLM的输入。我们提出的新型隐蔽性视觉记忆注入(VMI)攻击经过特殊设计,使得LVLM在面对正常提示时表现出标准行为,但一旦用户给出触发提示,LVLM便会输出特定的预设目标信息以操控用户,例如用于对抗性营销或政治宣传。与以往专注于单轮攻击的研究相比,VMI攻击即使用户与模型进行了长时间的多轮对话后依然有效。我们在多个近期开源的LVLM上验证了该攻击的有效性。本文由此证明,在多轮对话场景中,通过扰动图像实现大规模用户操控是可行的,这呼吁LVLM需要提升针对此类攻击的鲁棒性。我们在https://github.com/chs20/visual-memory-injection 公开了源代码。