A Unified Object Counting Network with Object Occupation Prior

from arxiv, Accepted by IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY; The dataset and code are available at: https://github.com/Tanyjiang/EOCO

The counting task, which plays a fundamental role in numerous applications (e.g., crowd counting, traffic statistics), aims to predict the number of objects with various densities. Existing object counting tasks are designed for a single object class. However, it is inevitable to encounter newly coming data with new classes in our real world. We name this scenario as \textit{evolving object counting}. In this paper, we build the first evolving object counting dataset and propose a unified object counting network as the first attempt to address this task. The proposed model consists of two key components: a class-agnostic mask module and a class-incremental module. The class-agnostic mask module learns generic object occupation prior via predicting a class-agnostic binary mask (e.g., 1 denotes there exists an object at the considering position in an image and 0 otherwise). The class-incremental module is used to handle new coming classes and provides discriminative class guidance for density map prediction. The combined outputs of class-agnostic mask module and image feature extractor are used to predict the final density map. When new classes come, we first add new neural nodes into the last regression and classification layers of class-incremental module. Then, instead of retraining the model from scratch, we utilize knowledge distillation to help the model remember what have already learned about previous object classes. We also employ a support sample bank to store a small number of typical training samples of each class, which are used to prevent the model from forgetting key information of old data. With this design, our model can efficiently and effectively adapt to new coming classes while keeping good performance on already seen data without large-scale retraining. Extensive experiments on the collected dataset demonstrate the favorable performance.

翻译：计数任务在众多应用（如人群计数、交通统计）中扮演基础性角色，旨在预测不同密度下的目标数量。现有目标计数任务均针对单一物体类别设计。然而，真实世界中不可避免地会遇到包含新类别的涌现数据。我们将此场景命名为"演化式目标计数"。本文构建了首个演化式目标计数数据集，并率先提出统一的目标计数网络来应对该任务。所提模型包含两个核心组件：类别无关掩码模块和类别增量模块。类别无关掩码模块通过学习预测类别无关的二值掩码（例如，1表示图像中对应位置存在物体，0表示无物体）来获取通用物体占据先验知识。类别增量模块用于处理新增类别，并为密度图预测提供具有判别性的类别引导。类别无关掩码模块与图像特征提取器的联合输出被用于预测最终密度图。当新增类别出现时，我们首先在类别增量模块的最终回归层和分类层中添加新的神经节点。随后，区别于从零开始重新训练模型，我们采用知识蒸馏技术协助模型保留已习得的旧类别知识。同时，我们构建支持样本库来存储每个类别少量典型训练样本，以防止模型遗忘旧数据的关键信息。通过此设计，模型无需大规模重训练即可高效适应新增类别，同时保持对已见数据的优异性能。在收集数据集上的大量实验证明了本方法的优越性能。