EulerMormer: Robust Eulerian Motion Magnification via Dynamic Filtering within Transformer

Video Motion Magnification (VMM) aims to break the resolution limit of human visual perception capability and reveal the imperceptible minor motion that contains valuable information in the macroscopic domain. However, challenges arise in this task due to photon noise inevitably introduced by photographic devices and spatial inconsistency in amplification, leading to flickering artifacts in static fields and motion blur and distortion in dynamic fields in the video. Existing methods focus on explicit motion modeling without emphasizing prioritized denoising during the motion magnification process. This paper proposes a novel dynamic filtering strategy to achieve static-dynamic field adaptive denoising. Specifically, based on Eulerian theory, we separate texture and shape to extract motion representation through inter-frame shape differences, expecting to leverage these subdivided features to solve this task finely. Then, we introduce a novel dynamic filter that eliminates noise cues and preserves critical features in the motion magnification and amplification generation phases. Overall, our unified framework, EulerMormer, is a pioneering effort to first equip with Transformer in learning-based VMM. The core of the dynamic filter lies in a global dynamic sparse cross-covariance attention mechanism that explicitly removes noise while preserving vital information, coupled with a multi-scale dual-path gating mechanism that selectively regulates the dependence on different frequency features to reduce spatial attenuation and complement motion boundaries. We demonstrate extensive experiments that EulerMormer achieves more robust video motion magnification from the Eulerian perspective, significantly outperforming state-of-the-art methods. The source code is available at https://github.com/VUT-HFUT/EulerMormer.

翻译：视频运动放大（VMM）旨在突破人类视觉感知能力的分辨率极限，揭示宏观领域中蕴含宝贵信息的不可见微小运动。然而，由于摄影设备不可避免地引入光子噪声以及放大过程中的空间不一致性，该任务面临挑战，导致视频静态区域出现闪烁伪影，动态区域出现运动模糊与畸变。现有方法专注于显式运动建模，未在运动放大过程中强调优先降噪。本文提出一种新颖的动态滤波策略，以实现静态-动态场自适应降噪。具体而言，基于欧拉理论，我们通过帧间形状差异分离纹理与形状以提取运动表征，期望利用这些细分特征来精细求解该任务。随后，我们引入一种新颖的动态滤波器，在运动放大与增强生成阶段消除噪声线索并保留关键特征。总体上，我们的统一框架EulerMormer首次在基于学习的VMM中配备Transformer。动态滤波器的核心在于全局动态稀疏交叉协方差注意力机制，该机制在保留关键信息的同时显式去除噪声，并结合多尺度双路径门控机制选择性调节对不同频率特征的依赖程度，以减少空间衰减并补充运动边界。大量实验表明，EulerMormer从欧拉视角实现了更鲁棒的视频运动放大，显著优于现有最优方法。源代码已开源：https://github.com/VUT-HFUT/EulerMormer。