ReVision : A Post-Hoc, Vision-Based Technique for Replacing Unacceptable Concepts in Image Generation Pipeline

Image-generative models are widely deployed across industries. Recent studies show that they can be exploited to produce policy-violating content. Existing mitigation strategies primarily operate at the pre- or mid-generation stages through techniques such as prompt filtering and safety-aware training/fine-tuning. Prior work shows that these approaches can be bypassed and often degrade generative quality. In this work, we propose ReVision, a training-free, prompt-based, post-hoc safety framework for image-generation pipeline. ReVision acts as a last-line defense by analyzing generated images and selectively editing unsafe concepts without altering the underlying generator. It uses the Gemini-2.5-Flash model as a generic policy-violating concept detector, avoiding reliance on multiple category-specific detectors, and performs localized semantic editing to replace unsafe content. Prior post-hoc editing methods often rely on imprecise spatial localization, that undermines usability and limits deployability, particularly in multi-concept scenes. To address this limitation, ReVision introduces a VLM-assisted spatial gating mechanism that enforces instance-consistent localization, enabling precise edits while preserving scene integrity. We evaluate ReVision on a 245-image benchmark covering both single- and multi-concept scenarios. Results show that ReVision (i) improves CLIP-based alignment toward safe prompts by +$0.121$ on average; (ii) significantly improves multi-concept background fidelity (LPIPS $0.166 \rightarrow 0.058$); (iii) achieves near-complete suppression on category-specific detectors (e.g., NudeNet $70.51 \rightarrow 0$); and (iv) reduces policy-violating content recognizability in a human moderation study from $95.99\%$ to $10.16\%$.

翻译：图像生成模型已在各行业广泛部署。近期研究表明，这些模型可能被利用来生成违反政策的内容。现有的缓解策略主要通过提示过滤和安全感知训练/微调等技术，在生成前或生成中阶段进行操作。先前工作表明，这些方法可能被绕过，且通常会降低生成质量。在本工作中，我们提出ReVision，一种免训练、基于提示、后处理的图像生成流程安全框架。ReVision通过分析生成的图像并选择性编辑不安全概念，同时不改变底层生成器，充当最后一道防线。它使用Gemini-2.5-Flash模型作为通用违规概念检测器，避免依赖多个特定类别检测器，并通过局部语义编辑来替换不安全内容。先前的后处理编辑方法通常依赖不精确的空间定位，这损害了可用性并限制了部署能力，尤其是在多概念场景中。为解决这一局限，ReVision引入了VLM辅助的空间门控机制，该机制强制执行实例一致定位，从而实现精确编辑同时保持场景完整性。我们在包含单概念和多概念场景的245张图像基准上评估ReVision。结果表明，ReVision（i）将基于CLIP的安全提示对齐度平均提升+$0.121$；（ii）显著改善多概念背景保真度（LPIPS $0.166 \rightarrow 0.058$）；（iii）在特定类别检测器上实现近乎完全抑制（例如NudeNet $70.51 \rightarrow 0$）；（iv）在人工审核研究中将违规内容可识别性从$95.99\%$降低至$10.16\%$。