Image manipulation localization (IML) and general vision tasks are typically treated as two separate research directions due to the fundamental differences between manipulation-specific and semantic features. In this paper, however, we bridge this gap by introducing a fresh perspective: these two directions are intrinsically connected, and general semantic priors can benefit IML. Building on this insight, we propose a novel trainable adapter (named ReVi) that repurposes existing off-the-shelf general-purpose vision models (e.g., image generation and segmentation networks) for IML. Inspired by robust principal component analysis, the adapter disentangles semantic redundancy from manipulation-specific information embedded in these models and selectively enhances the latter. Unlike existing IML methods that require extensive model redesign and full retraining, our method relies on the off-the-shelf vision models with frozen parameters and only fine-tunes the proposed adapter. The experimental results demonstrate the superiority of our method, showing the potential for scalable IML frameworks.
翻译:图像篡改定位(IML)与通用视觉任务通常被视为两个独立的研究方向,其根本原因在于篡改特征与语义特征存在本质差异。然而,本文通过引入全新视角弥合了这一鸿沟:这两个方向存在内在联系,通用语义先验能够为IML提供帮助。基于这一发现,我们提出了一种新型可训练适配器(命名为ReVi),能够将现有现成的通用视觉模型(如图像生成网络和分割网络)重新应用于IML。受鲁棒主成分分析启发,该适配器能够从这些模型中嵌入的语义冗余与篡改特定信息中实现解耦,并选择性增强后者。不同于需要大规模模型重构和完整重训练的现有IML方法,我们的方法仅需使用参数冻结的现成视觉模型,仅对提出的适配器进行微调。实验结果证明了该方法的优越性,展现了可扩展IML框架的巨大潜力。