With the rise in popularity of portable devices, the spread of falsified media on social platforms has become rampant. This necessitates the timely identification of authentic content. However, most advanced detection methods are computationally heavy, hindering their real-time application. In this paper, we describe an efficient two-stream architecture for real-time image manipulation detection. Our method consists of two-stream branches targeting the cognitive and inspective perspectives. In the cognitive branch, we propose efficient wavelet-guided Transformer blocks to capture the global manipulation traces related to frequency. This block contains an interactive wavelet-guided self-attention module that integrates wavelet transformation with efficient attention design, interacting with the knowledge from the inspective branch. The inspective branch consists of simple convolutions that capture fine-grained traces and interact bidirectionally with Transformer blocks to provide mutual support. Our method is lightweight ($\sim$ 8M) but achieves competitive performance compared to many other counterparts, demonstrating its efficacy in image manipulation detection and its potential for portable integration.
翻译:随着便携设备的普及,社交媒体平台上虚假媒体的传播日益猖獗,这亟需对真实内容进行及时识别。然而,大多数先进的检测方法计算负担沉重,阻碍了其实时应用。本文提出一种用于实时图像篡改检测的高效双流架构。我们的方法包含针对认知与细察视角的双流分支。在认知分支中,我们提出了高效的小波引导Transformer模块,以捕获与频率相关的全局篡改痕迹。该模块包含一个交互式小波引导自注意力模块,它将小波变换与高效注意力设计相结合,并与细察分支的知识进行交互。细察分支由简单的卷积层构成,用于捕获细粒度痕迹,并与Transformer模块进行双向交互以提供相互支持。我们的方法非常轻量(约8M参数),但与许多其他方法相比仍取得了有竞争力的性能,证明了其在图像篡改检测中的有效性及其在便携设备上集成的潜力。