Traditional crowd counting networks suffer from information loss when feature maps are downsized through pooling layers, leading to inaccuracies in counting crowds at a distance. Existing methods often assume correct annotations during training, disregarding the impact of noisy annotations, especially in crowded scenes. Furthermore, the use of a fixed Gaussian kernel fails to account for the varying pixel distribution with respect to the camera distance. To overcome these challenges, we propose a Scale-Aware Crowd Counting Network (SACC-Net) that introduces a ``scale-aware'' architecture with error-correcting capabilities of noisy annotations. For the first time, we {\bf simultaneously} model labeling errors (mean) and scale variations (variance) by spatially-varying Gaussian distributions to produce fine-grained heat maps for crowd counting. Furthermore, the proposed adaptive Gaussian kernel variance enables the model to learn dynamically with a low-rank approximation, leading to improved convergence efficiency with comparable accuracy. The performance of SACC-Net is extensively evaluated on four public datasets: UCF-QNRF, UCF CC 50, NWPU, and ShanghaiTech A-B. Experimental results demonstrate that SACC-Net outperforms all state-of-the-art methods, validating its effectiveness in achieving superior crowd counting accuracy.
翻译:传统的人群计数网络在通过池化层缩小特征图时存在信息损失,导致远距离人群计数不准确。现有方法通常在训练中假设标注完全正确,忽略了噪声标注的影响,尤其是在拥挤场景中。此外,固定高斯核无法适应与相机距离相关的像素分布变化。为解决这些挑战,我们提出一种具有尺度感知架构且具备噪声标注误差校正能力的人群计数网络(SACC-Net)。首次通过空间变化的高斯分布同时建模标注误差(均值)与尺度变化(方差),生成细粒度热力图用于人群计数。此外,所提出的自适应高斯核方差使模型能够通过低秩近似动态学习,在保持相当精度的同时提升收敛效率。SACC-Net的性能在四个公开数据集(UCF-QNRF、UCF CC 50、NWPU、ShanghaiTech A-B)上进行了全面评估。实验结果表明,SACC-Net优于所有最先进方法,验证了其在实现更优人群计数精度方面的有效性。