FGA: Fourier-Guided Attention Network for Crowd Count Estimation

Crowd counting is gaining societal relevance, particularly in domains of Urban Planning, Crowd Management, and Public Safety. This paper introduces Fourier-guided attention (FGA), a novel attention mechanism for crowd count estimation designed to address the inefficient full-scale global pattern capture in existing works on convolution-based attention networks. FGA efficiently captures multi-scale information, including full-scale global patterns, by utilizing Fast-Fourier Transformations (FFT) along with spatial attention for global features and convolutions with channel-wise attention for semi-global and local features. The architecture of FGA involves a dual-path approach: (1) a path for processing full-scale global features through FFT, allowing for efficient extraction of information in the frequency domain, and (2) a path for processing remaining feature maps for semi-global and local features using traditional convolutions and channel-wise attention. This dual-path architecture enables FGA to seamlessly integrate frequency and spatial information, enhancing its ability to capture diverse crowd patterns. We apply FGA in the last layers of two popular crowd-counting works, CSRNet and CANNet, to evaluate the module's performance on benchmark datasets such as ShanghaiTech-A, ShanghaiTech-B, UCF-CC-50, and JHU++ crowd. The experiments demonstrate a notable improvement across all datasets based on Mean-Squared-Error (MSE) and Mean-Absolute-Error (MAE) metrics, showing comparable performance to recent state-of-the-art methods. Additionally, we illustrate the interpretability using qualitative analysis, leveraging Grad-CAM heatmaps, to show the effectiveness of FGA in capturing crowd patterns.

翻译：人群计数正日益获得社会关注，尤其在城市规划、人群管理与公共安全等领域。本文提出了傅里叶引导注意力（FGA），一种用于人群计数估计的新型注意力机制，旨在解决现有基于卷积的注意力网络工作中全尺度全局模式捕获效率低下的问题。FGA通过利用快速傅里叶变换（FFT）结合空间注意力来提取全局特征，并利用卷积结合通道注意力来提取半全局与局部特征，从而高效地捕获包括全尺度全局模式在内的多尺度信息。FGA的架构采用双路径设计：（1）一条路径通过FFT处理全尺度全局特征，实现在频域中高效提取信息；（2）另一条路径使用传统卷积和通道注意力处理剩余特征图，以获取半全局和局部特征。这种双路径架构使FGA能够无缝整合频域与空间信息，增强其捕获多样化人群模式的能力。我们将FGA应用于两种主流人群计数模型CSRNet和CANNet的最后一层，并在ShanghaiTech-A、ShanghaiTech-B、UCF-CC-50和JHU++ crowd等基准数据集上评估该模块的性能。实验结果表明，基于均方误差（MSE）和平均绝对误差（MAE）指标，所有数据集上的性能均有显著提升，其表现与当前最先进方法相当。此外，我们通过定性分析（利用Grad-CAM热力图）阐释了FGA的可解释性，展示了其在捕获人群模式方面的有效性。