This work considers supervised learning to count from images and their corresponding point annotations. Where density-based counting methods typically use the point annotations only to create Gaussian-density maps, which act as the supervision signal, the starting point of this work is that point annotations have counting potential beyond density map generation. We introduce two methods that repurpose the available point annotations to enhance counting performance. The first is a counting-specific augmentation that leverages point annotations to simulate occluded objects in both input and density images to enhance the network's robustness to occlusions. The second method, foreground distillation, generates foreground masks from the point annotations, from which we train an auxiliary network on images with blacked-out backgrounds. By doing so, it learns to extract foreground counting knowledge without interference from the background. These methods can be seamlessly integrated with existing counting advances and are adaptable to different loss functions. We demonstrate complementary effects of the approaches, allowing us to achieve robust counting results even in challenging scenarios such as background clutter, occlusion, and varying crowd densities. Our proposed approach achieves strong counting results on multiple datasets, including ShanghaiTech Part\_A and Part\_B, UCF\_QNRF, JHU-Crowd++, and NWPU-Crowd.
翻译:本研究考虑从图像及其对应点标注进行监督学习的计数任务。基于密度的计数方法通常仅将点标注用于生成高斯密度图作为监督信号,而本文的出发点在于,点标注除密度图生成外还蕴含计数潜力。我们提出两种方法重新利用可用点标注以增强计数性能。其一是专为计数设计的增强方法,利用点标注在输入图像和密度图像中模拟遮挡物体,从而提升网络对遮挡的鲁棒性。第二种方法——前景蒸馏——从点标注生成前景掩码,并基于背景黑化处理的图像训练辅助网络。通过这种方式,该网络能在不受背景干扰的情况下学习提取前景计数知识。这些方法可无缝集成现有计数技术进展,并适配不同损失函数。我们证明了这些方法的互补效应,即使在背景杂乱、遮挡及人群密度变化等挑战性场景中,仍能实现稳健的计数结果。所提方法在多个数据集上取得优异计数性能,包括ShanghaiTech Part_A 和 Part_B、UCF_QNRF、JHU-Crowd++ 及 NWPU-Crowd。