Vision-language agents increasingly consume screenshots, documents, and user interfaces before writing to memory, sending messages, or invoking external tools. We study a concrete failure mode in this setting: action-boundary propagation, where sensitive or unsafe visible text is copied from an image into downstream tool arguments. We present VisualLeakBench, a diversified 500-image benchmark spanning UI, chat, document, form, and dashboard scenes, and evaluate a stratified 100-image agent subset with four production VLM systems under two workflows: note capture and external handoff. At baseline, target strings are propagated into tool arguments in 78.8% of PII cases and 85.5% of rendered unsafe-text cases. Under a defensive system prompt, rendered unsafe-text propagation remains high at 52.6%, while PII tool propagation falls to 2.0%, largely by suppressing tool use rather than preserving utility. Rates are tool-surface dependent: search-like tools suppress PII propagation, but rendered unsafe text still crosses tool boundaries. We measure visual-to-tool propagation rather than downstream instruction execution. We additionally provide a labeled-target oracle upper-bound diagnostic that localizes most failures at the tool boundary while leaving response-side leakage as residual risk.
翻译:视觉语言代理在写入记忆、发送消息或调用外部工具前,越来越多地摄取屏幕截图、文档和用户界面。我们研究了该场景下一种具体的故障模式:动作边界传播,即敏感或不安全的可见文本从图像被复制到下游工具参数中。我们提出了VisualLeakBench,一个涵盖UI、聊天、文档、表单和仪表板场景的多样化500图像基准,并在两种工作流(笔记捕获和外部交接)下使用四个生产级VLM系统对100个分层采样的代理子集进行评估。基线条件下,目标字符串在78.8%的个人隐私信息(PII)案例和85.5%的渲染不安全文本案例中被传播到工具参数中。在采用防御性系统提示后,渲染不安全文本的传播率仍高达52.6%,而PII的工具传播率降至2.0%,这主要通过抑制工具使用而非保持实用性实现。传播率具有工具表面依赖性:搜索类工具抑制了PII传播,但渲染的不安全文本仍会跨越工具边界。我们测量的是视觉到工具的传播而非下游指令执行。此外,我们提供了一个带标签目标的上限诊断方法,将大多数故障定位在工具边界,同时将响应端泄漏作为残留风险。