Advances in image tampering pose serious security threats, underscoring the need for effective image manipulation localization (IML). While supervised IML achieves strong performance, it depends on costly pixel-level annotations. Existing weakly supervised or training-free alternatives often underperform and lack interpretability. We propose the In-Context Forensic Chain (ICFC), a training-free framework that leverages multi-modal large language models (MLLMs) for interpretable IML tasks. ICFC integrates an objectified rule construction with adaptive filtering to build a reliable knowledge base and a multi-step progressive reasoning pipeline that mirrors expert forensic workflows from coarse proposals to fine-grained forensics results. This design enables systematic exploitation of MLLM reasoning for image-level classification, pixel-level localization, and text-level interpretability. Across multiple benchmarks, ICFC not only surpasses state-of-the-art training-free methods but also achieves competitive or superior performance compared to weakly and fully supervised approaches.
翻译:图像篡改技术的进步构成了严重的安全威胁,凸显了有效的图像篡改定位技术的必要性。虽然监督式图像篡改定位取得了优异的性能,但其依赖于成本高昂的像素级标注。现有的弱监督或无训练替代方案往往性能不佳且缺乏可解释性。我们提出了上下文取证链,这是一个无需训练的框架,它利用多模态大语言模型来完成可解释的图像篡改定位任务。该框架集成了对象化规则构建与自适应过滤,以构建可靠的知识库,以及一个多步骤渐进式推理流程,该流程模拟了从粗粒度提议到细粒度取证结果的专家取证工作流。这一设计使得系统性地利用多模态大语言模型的推理能力,实现图像级分类、像素级定位和文本级可解释性成为可能。在多个基准测试中,上下文取证链不仅超越了最先进的无训练方法,而且与弱监督及全监督方法相比,取得了具有竞争力或更优的性能。