Object detection models are commonly used for people counting (and localization) in many applications but require a dataset with costly bounding box annotations for training. Given the importance of privacy in people counting, these models rely more and more on infrared images, making the task even harder. In this paper, we explore how weaker levels of supervision can affect the performance of deep person counting architectures for image classification and point-level localization. Our experiments indicate that counting people using a CNN Image-Level model achieves competitive results with YOLO detectors and point-level models, yet provides a higher frame rate and a similar amount of model parameters.
翻译:目标检测模型广泛应用于多种场景下的人员计数(及定位)任务,但其训练所需的数据集需要昂贵的边界框标注。考虑到人员计数任务对隐私保护的重要性,该类模型越来越依赖红外图像,这使得任务更具挑战性。本文探讨了图像分类与点级定位任务中,较弱监督层次对深度人员计数架构性能的影响。实验表明,使用基于CNN的图像级模型进行人员计数,能在获得与YOLO检测器和点级模型相当结果的同时,提供更高帧率并保持相近的参数量。