Multi-spectral Class Center Network for Face Manipulation Detection and Localization

As Deepfake contents continue to proliferate on the internet, advancing face manipulation forensics has become a pressing issue. To combat this emerging threat, previous methods mainly focus on studying how to distinguish authentic and manipulated face images. Despite impressive, image-level classification lacks explainability and is limited to some specific application scenarios. Existing forgery localization methods suffer from imprecise and inconsistent pixel-level annotations. To alleviate these problems, this paper first re-constructs the FaceForensics++ dataset by introducing pixel-level annotations, then builds an extensive benchmark for localizing tampered regions. Next, a novel Multi-Spectral Class Center Network (MSCCNet) is proposed for face manipulation detection and localization. Specifically, inspired by the power of frequency-related forgery traces, we design Multi-Spectral Class Center (MSCC) module to learn more generalizable and semantic-agnostic features. Based on the features of different frequency bands, the MSCC module collects multispectral class centers and computes pixel-to-class relations. Applying multi-spectral class-level representations suppresses the semantic information of the visual concepts, which is insensitive to manipulations. Furthermore, we propose a Multi-level Features Aggregation (MFA) module to employ more low-level forgery artifacts and structure textures. Experimental results quantitatively and qualitatively indicate the effectiveness and superiority of the proposed MSCCNet on comprehensive localization benchmarks. We expect this work to inspire more studies on pixel-level face manipulation localization. The annotations and code will be available.

翻译：随着深度伪造内容在互联网上持续泛滥，推进人脸篡改取证已成为一项紧迫课题。为应对这一新兴威胁，以往方法主要聚焦于研究如何区分真实与伪造人脸图像。尽管图像级分类取得了显著成效，但其可解释性不足且仅限于特定应用场景。现有伪造定位方法面临像素级标注不精确、不一致的难题。为缓解这些问题，本文首先通过引入像素级标注重构了FaceForensics++数据集，进而构建了篡改区域定位的全面基准。随后，提出了一种新颖的多光谱类别中心网络（MSCCNet）用于人脸篡改检测与定位。具体而言，受频率相关伪造痕迹强大效能的启发，我们设计了多光谱类别中心（MSCC）模块，以学习更具泛化性且语义无关的特征。基于不同频段的特征，MSCC模块收集多光谱类别中心并计算像素与类别之间的关系。应用多光谱类别级表示可抑制视觉概念的语义信息，从而对篡改操作不敏感。此外，我们提出了多级特征聚合（MFA）模块，以利用更多底层伪造伪影和结构纹理。实验结果的定量与定性分析表明，所提出的MSCCNet在全面定位基准上具有有效性和优越性。我们期望此项工作能激发更多关于像素级人脸篡改定位的研究。相关标注和代码将公开提供。