Crowd counting has gained significant popularity due to its practical applications. However, mainstream counting methods ignore precise individual localization and suffer from annotation noise because of counting from estimating density maps. Additionally, they also struggle with high-density images.To address these issues, we propose an end-to-end model called Fine-Grained Extraction Network (FGENet). Different from methods estimating density maps, FGENet directly learns the original coordinate points that represent the precise localization of individuals.This study designs a fusion module, named Fine-Grained Feature Pyramid(FGFP), that is used to fuse feature maps extracted by the backbone of FGENet. The fused features are then passed to both regression and classification heads, where the former provides predicted point coordinates for a given image, and the latter determines the confidence level for each predicted point being an individual. At the end, FGENet establishes correspondences between prediction points and ground truth points by employing the Hungarian algorithm. For training FGENet, we design a robust loss function, named Three-Task Combination (TTC), to mitigate the impact of annotation noise. Extensive experiments are conducted on four widely used crowd counting datasets. Experimental results demonstrate the effectiveness of FGENet. Notably, our method achieves a remarkable improvement of 3.14 points in Mean Absolute Error (MAE) on the ShanghaiTech Part A dataset, showcasing its superiority over the existing state-of-the-art methods. Even more impressively, FGENet surpasses previous benchmarks on the UCF\_CC\_50 dataset with an astounding enhancement of 30.16 points in MAE.
翻译:人群计数因其实际应用而广受关注。然而,主流计数方法忽略了个体的精确定位,且因基于密度图估计导致标注噪声。此外,这些方法在处理高密度图像时也存在困难。为解决上述问题,我们提出了一种名为细粒度提取网络(FGENet)的端到端模型。不同于估计密度图的方法,FGENet直接学习表征个体精确位置的原始坐标点。本研究设计了一个名为细粒度特征金字塔(FGFP)的融合模块,用于融合FGENet骨干网络提取的特征图。融合后的特征随后被传递至回归头与分类头,其中回归头为给定图像提供预测点坐标,分类头则确定每个预测点为个体的置信度。最终,FGENet通过匈牙利算法建立预测点与真实点之间的对应关系。为训练FGENet,我们设计了一个稳健的损失函数——三任务组合(TTC),以减轻标注噪声的影响。在四个广泛使用的人群计数数据集上进行了大量实验。实验结果验证了FGENet的有效性。值得注意的是,我们的方法在ShanghaiTech Part A数据集上平均绝对误差(MAE)实现了3.14个点的显著提升,展现了其相对于现有最先进方法的优越性。更令人印象深刻的是,FGENet在UCF_CC_50数据集上以30.16个点的MAE惊人提升,超越了先前基准。