Visual manipulation localization (VML) aims to identify tampered regions in images and videos, a task that has become increasingly challenging with the rise of advanced editing tools. Existing methods face two central issues. The first is resolution diversity. Resizing or padding can distort subtle forensic cues and introduce unnecessary computational cost. The second is the difficulty of extending spatial models for images to spatio-temporal inputs in videos, which often results in maintaining separate architectures for the two data types. To address these challenges, we propose RelayFormer, a unified framework that adapts to varying resolutions and naturally handles both static and temporal visual data. RelayFormer partitions inputs into fixed-size sub-images and introduces Global Local Relay (GLR) tokens that propagate structured context through a relay-based attention mechanism. This design enables efficient exchange of global cues, such as semantic or temporal consistency, while preserving fine-grained manipulation artifacts. Unlike prior approaches that depend on uniform resizing or sparse attention, RelayFormer scales to variable resolutions and video sequences with minimal overhead. Experiments across diverse benchmarks demonstrate superior performance and strong efficiency, combining resolution adaptivity without interpolation or excessive padding, unified processing for images and videos, and a favorable balance between accuracy and computational cost. Code is available at~\href{https://github.com/WenOOI/RelayFormer}{https://github.com/WenOOI/RelayFormer}.
翻译:视觉操作定位旨在识别图像与视频中的篡改区域,该任务随着先进编辑工具的普及而日益具有挑战性。现有方法面临两大核心问题:其一是分辨率多样性,调整大小或填充可能扭曲细微取证线索并引入不必要的计算开销;其二是将图像空间模型扩展至视频时空输入的困难,这通常导致需为两种数据类型维护独立架构。为应对这些挑战,我们提出RelayFormer——一个适配不同分辨率并自然处理静态与时空视觉数据的统一框架。RelayFormer将输入划分为固定大小的子图像,并引入全局局部中继令牌,通过基于中继的注意力机制传播结构化上下文。该设计能够在保留细粒度操作痕迹的同时,高效交换全局线索(如语义或时间一致性)。不同于依赖统一缩放或稀疏注意力的先前方法,RelayFormer以极低开销扩展至可变分辨率与视频序列。跨多个基准的实验表明,其结合了无插值或过度填充的分辨率自适应性、图像与视频的统一处理能力,以及精度与计算成本间的有利平衡,展现了卓越性能与高效性。代码开源于:\href{https://github.com/WenOOI/RelayFormer}{https://github.com/WenOOI/RelayFormer}。