Image manipulation can lead to misinterpretation of visual content, posing significant risks to information security. Image Manipulation Localization (IML) has thus received increasing attention. However, existing IML methods rely heavily on task-specific designs, making them perform well only on one target image type but are mostly random guessing on other image types, and even joint training on multiple image types causes significant performance degradation. This hinders the deployment for real applications as it notably increases maintenance costs and the misclassification of image types leads to serious error accumulation. To this end, we propose Omni-IML, the first generalist model to unify diverse IML tasks. Specifically, Omni-IML achieves generalism by adopting the Modal Gate Encoder and the Dynamic Weight Decoder to adaptively determine the optimal encoding modality and the optimal decoder filters for each sample. We additionally propose an Anomaly Enhancement module that enhances the features of tampered regions with box supervision and helps the generalist model to extract common features across different IML tasks. We validate our approach on IML tasks across three major scenarios: natural images, document images, and face images. Without bells and whistles, our Omni-IML achieves state-of-the-art performance on all three tasks with a single unified model, providing valuable strategies and insights for real-world application and future research in generalist image forensics. Our code will be publicly available.
翻译:图像篡改可能导致对视觉内容的误解,对信息安全构成重大风险。因此,图像篡改定位(IML)受到越来越多的关注。然而,现有的IML方法严重依赖于任务特定的设计,导致其仅在一种目标图像类型上表现良好,而在其他图像类型上大多为随机猜测,甚至在多种图像类型上进行联合训练也会导致显著的性能下降。这阻碍了实际应用的部署,因为它显著增加了维护成本,并且图像类型的误分类会导致严重的错误累积。为此,我们提出了Omni-IML,这是首个统一多种IML任务的通用模型。具体而言,Omni-IML通过采用模态门控编码器和动态权重解码器,自适应地为每个样本确定最优的编码模态和最优的解码器滤波器,从而实现通用性。我们还提出了一个异常增强模块,该模块通过边界框监督增强篡改区域的特征,并帮助通用模型提取不同IML任务间的共同特征。我们在三大主要场景(自然图像、文档图像和人脸图像)的IML任务上验证了我们的方法。在没有任何额外技巧的情况下,我们的Omni-IML凭借单一的统一模型在所有三项任务上均取得了最先进的性能,为通用图像取证的现实应用和未来研究提供了有价值的策略和见解。我们的代码将公开可用。