Spatiotemporal predictive learning (STPL) aims to forecast future frames from past observations and is essential across a wide range of applications. Compared with recurrent or hybrid architectures, pure convolutional models offer superior efficiency and full parallelism, yet their fixed receptive fields limit their ability to adaptively capture spatially varying motion patterns. Inspired by biological center-surround organization and frequency-selective signal processing, we propose PFGNet, a fully convolutional framework that dynamically modulates receptive fields through pixel-wise frequency-guided gating. The core Peripheral Frequency Gating (PFG) block extracts localized spectral cues and adaptively fuses multi-scale large-kernel peripheral responses with learnable center suppression, effectively forming spatially adaptive band-pass filters. To maintain efficiency, all large kernels are decomposed into separable 1D convolutions ($1 \times k$ followed by $k \times 1$), reducing per-channel computational cost from $O(k^2)$ to $O(2k)$. PFGNet enables structure-aware spatiotemporal modeling without recurrence or attention. Experiments on Moving MNIST, TaxiBJ, Human3.6M, and KTH show that PFGNet delivers SOTA or near-SOTA forecasting performance with substantially fewer parameters and FLOPs. Our code is available at https://github.com/fhjdqaq/PFGNet.
翻译:时空预测学习(STPL)旨在根据过去观测预测未来帧,在众多应用中至关重要。与循环或混合架构相比,纯卷积模型具有更高的效率和完全并行性,但其固定感受野限制了自适应捕捉空间变化运动模式的能力。受生物中心-环绕组织和频率选择性信号处理的启发,我们提出PFGNet——一种通过像素级频率引导门控动态调节感受野的全卷积框架。核心的"外周频率门控(PFG)"模块提取局部频谱线索,自适应融合多尺度大核外周响应与可学习的中心抑制,形成空间自适应带通滤波器。为保持效率,所有大核均分解为可分离的1D卷积($1 \times k$后接$k \times 1$),使每通道计算成本从$O(k^2)$降至$O(2k)$。PFGNet无需循环或注意力机制即可实现结构感知的时空建模。在Moving MNIST、TaxiBJ、Human3.6M和KTH数据集上的实验表明,PFGNet以显著更少的参数和FLOPs取得了SOTA或接近SOTA的预测性能。我们的代码已开源:https://github.com/fhjdqaq/PFGNet