ReVision：一种基于视觉的后处理技术，用于替换图像生成流程中不可接受的概念 (ReVision : A Post-Hoc, Vision-Based Technique for Replacing Unacceptable Concepts in Image Generation Pipeline)

Image-generative models are widely deployed across industries. Recent studies show that they can be exploited to produce policy-violating content. Existing mitigation strategies primarily operate at the pre- or mid-generation stages through techniques such as prompt filtering and safety-aware training/fine-tuning. Prior work shows that these approaches can be bypassed and often degrade generative quality. In this work, we propose ReVision, a training-free, prompt-based, post-hoc safety framework for image-generation pipeline. ReVision acts as a last-line defense by analyzing generated images and selectively editing unsafe concepts without altering the underlying generator. It uses the Gemini-2.5-Flash model as a generic policy-violating concept detector, avoiding reliance on multiple category-specific detectors, and performs localized semantic editing to replace unsafe content. Prior post-hoc editing methods often rely on imprecise spatial localization, that undermines usability and limits deployability, particularly in multi-concept scenes. To address this limitation, ReVision introduces a VLM-assisted spatial gating mechanism that enforces instance-consistent localization, enabling precise edits while preserving scene integrity. We evaluate ReVision on a 245-image benchmark covering both single- and multi-concept scenarios. Results show that ReVision (i) improves CLIP-based alignment toward safe prompts by +$0.121$ on average; (ii) significantly improves multi-concept background fidelity (LPIPS $0.166 \rightarrow 0.058$); (iii) achieves near-complete suppression on category-specific detectors (e.g., NudeNet $70.51 \rightarrow 0$); and (iv) reduces policy-violating content recognizability in a human moderation study from $95.99\%$ to $10.16\%$.

翻译：图像生成模型在各行业得到广泛应用。近期研究表明，这些模型可能被利用来生成违反政策的内容。现有的缓解策略主要通过在生成前或生成中阶段进行操作，例如采用提示词过滤和安全感知训练/微调等技术。先前工作表明，这些方法可能被绕过，并且通常会降低生成质量。在本工作中，我们提出了ReVision，一种免训练、基于提示词、后处理的图像生成流程安全框架。ReVision通过分析生成的图像并选择性地编辑不安全概念，同时不改变底层生成器，充当最后一道防线。它使用Gemini-2.5-Flash模型作为通用的违反政策概念检测器，避免依赖多个特定类别检测器，并执行局部语义编辑以替换不安全内容。先前的后处理编辑方法通常依赖于不精确的空间定位，这损害了可用性并限制了可部署性，尤其是在多概念场景中。为了解决这一局限，ReVision引入了一种VLM辅助的空间门控机制，该机制强制执行实例一致定位，从而实现精确编辑，同时保持场景完整性。我们在一个包含单概念和多概念场景的245张图像基准测试上评估了ReVision。结果表明，ReVision（i）将基于CLIP的对安全提示词的匹配度平均提高了+$0.121$；（ii）显著提高了多概念背景保真度（LPIPS $0.166 \rightarrow 0.058$）；（iii）在特定类别检测器上实现了近乎完全的抑制（例如，NudeNet $70.51 \rightarrow 0$）；（iv）在人工审核研究中，将违反政策内容的可识别性从$95.99\%$降低到$10.16\%$。