Crowd counting is a key aspect of crowd analysis and has been typically accomplished by estimating a crowd-density map and summing over the density values. However, this approach suffers from background noise accumulation and loss of density due to the use of broad Gaussian kernels to create the ground truth density maps. This issue can be overcome by narrowing the Gaussian kernel. However, existing approaches perform poorly when trained with such ground truth density maps. To overcome this limitation, we propose using conditional diffusion models to predict density maps, as diffusion models are known to model complex distributions well and show high fidelity to training data during crowd-density map generation. Furthermore, as the intermediate time steps of the diffusion process are noisy, we incorporate a regression branch for direct crowd estimation only during training to improve the feature learning. In addition, owing to the stochastic nature of the diffusion model, we introduce producing multiple density maps to improve the counting performance contrary to the existing crowd counting pipelines. Further, we also differ from the density summation and introduce contour detection followed by summation as the counting operation, which is more immune to background noise. We conduct extensive experiments on public datasets to validate the effectiveness of our method. Specifically, our novel crowd-counting pipeline improves the error of crowd-counting by up to $6\%$ on JHU-CROWD++ and up to $7\%$ on UCF-QNRF.
翻译:人群计数是人群分析的关键方面,通常通过估计人群密度图并求和密度值来实现。然而,这种方法由于使用宽高斯核生成真实密度图,会导致背景噪声累积和密度损失。缩小高斯核可以克服这一问题,但现有方法在使用此类真实密度图训练时效果不佳。为克服这一局限,我们提出使用条件扩散模型预测密度图,因为扩散模型以擅长建模复杂分布并在人群密度图生成过程中对训练数据保持高保真度而著称。此外,由于扩散过程中的中间时间步存在噪声,我们仅在训练阶段引入回归分支进行直接人群估计,以改善特征学习。同时,鉴于扩散模型的随机特性,我们引入生成多个密度图的方法来提升计数性能,这与现有的人群计数流程不同。进一步地,我们摒弃了密度求和法,提出将轮廓检测与求和作为计数操作,该方法对背景噪声更具鲁棒性。我们在公共数据集上进行了大量实验以验证方法的有效性。具体而言,我们新提出的人群计数流程在JHU-CROWD++数据集上将人群计数误差降低了最多6%,在UCF-QNRF数据集上降低了最多7%。