Image-generative models are widely deployed across industries. Recent studies show that they can be exploited to produce policy-violating content. Existing mitigation strategies primarily operate at the pre- or mid-generation stages through techniques such as prompt filtering and safety-aware training/fine-tuning. Prior work shows that these approaches can be bypassed and often degrade generative quality. In this work, we propose ReVision, a training-free, prompt-based, post-hoc safety framework for image-generation pipeline. ReVision acts as a last-line defense by analyzing generated images and selectively editing unsafe concepts without altering the underlying generator. It uses the Gemini-2.5-Flash model as a generic policy-violating concept detector, avoiding reliance on multiple category-specific detectors, and performs localized semantic editing to replace unsafe content. Prior post-hoc editing methods often rely on imprecise spatial localization, that undermines usability and limits deployability, particularly in multi-concept scenes. To address this limitation, ReVision introduces a VLM-assisted spatial gating mechanism that enforces instance-consistent localization, enabling precise edits while preserving scene integrity. We evaluate ReVision on a 245-image benchmark covering both single- and multi-concept scenarios. Results show that ReVision (i) improves CLIP-based alignment toward safe prompts by +$0.121$ on average; (ii) significantly improves multi-concept background fidelity (LPIPS $0.166 \rightarrow 0.058$); (iii) achieves near-complete suppression on category-specific detectors (e.g., NudeNet $70.51 \rightarrow 0$); and (iv) reduces policy-violating content recognizability in a human moderation study from $95.99\%$ to $10.16\%$.
翻译:图像生成模型在各行业得到广泛应用。近期研究表明,这些模型可能被利用来生成违反政策的内容。现有的缓解策略主要通过在生成前或生成中阶段进行操作,例如采用提示词过滤和安全感知训练/微调等技术。先前工作表明,这些方法可能被绕过,并且通常会降低生成质量。在本工作中,我们提出了ReVision,一种免训练、基于提示词、后处理的图像生成流程安全框架。ReVision通过分析生成的图像并选择性地编辑不安全概念,同时不改变底层生成器,充当最后一道防线。它使用Gemini-2.5-Flash模型作为通用的违反政策概念检测器,避免依赖多个特定类别检测器,并执行局部语义编辑以替换不安全内容。先前的后处理编辑方法通常依赖于不精确的空间定位,这损害了可用性并限制了可部署性,尤其是在多概念场景中。为了解决这一局限,ReVision引入了一种VLM辅助的空间门控机制,该机制强制执行实例一致定位,从而实现精确编辑,同时保持场景完整性。我们在一个包含单概念和多概念场景的245张图像基准测试上评估了ReVision。结果表明,ReVision(i)将基于CLIP的对安全提示词的匹配度平均提高了+$0.121$;(ii)显著提高了多概念背景保真度(LPIPS $0.166 \rightarrow 0.058$);(iii)在特定类别检测器上实现了近乎完全的抑制(例如,NudeNet $70.51 \rightarrow 0$);(iv)在人工审核研究中,将违反政策内容的可识别性从$95.99\%$降低到$10.16\%$。