The counting task, which plays a fundamental role in numerous applications (e.g., crowd counting, traffic statistics), aims to predict the number of objects with various densities. Existing object counting tasks are designed for a single object class. However, it is inevitable to encounter newly coming data with new classes in our real world. We name this scenario as \textit{evolving object counting}. In this paper, we build the first evolving object counting dataset and propose a unified object counting network as the first attempt to address this task. The proposed model consists of two key components: a class-agnostic mask module and a class-incremental module. The class-agnostic mask module learns generic object occupation prior via predicting a class-agnostic binary mask (e.g., 1 denotes there exists an object at the considering position in an image and 0 otherwise). The class-incremental module is used to handle new coming classes and provides discriminative class guidance for density map prediction. The combined outputs of class-agnostic mask module and image feature extractor are used to predict the final density map. When new classes come, we first add new neural nodes into the last regression and classification layers of class-incremental module. Then, instead of retraining the model from scratch, we utilize knowledge distillation to help the model remember what have already learned about previous object classes. We also employ a support sample bank to store a small number of typical training samples of each class, which are used to prevent the model from forgetting key information of old data. With this design, our model can efficiently and effectively adapt to new coming classes while keeping good performance on already seen data without large-scale retraining. Extensive experiments on the collected dataset demonstrate the favorable performance.
翻译:计数任务在众多应用(例如人群计数、交通统计)中扮演着基础性角色,其目标是预测具有不同密度的物体数量。现有的物体计数任务均针对单一物体类别设计。然而,在现实世界中,不可避免地会遇到包含新类别的数据。我们将这一场景命名为演化式物体计数。本文构建了首个演化式物体计数数据集,并首次提出统一物体计数网络以解决该任务。所提模型包含两个关键组件:类别无关掩码模块和类别增量模块。类别无关掩码模块通过预测类别无关的二值掩码(例如,1表示图像中该位置存在物体,0表示不存在)来学习通用物体占用先验。类别增量模块用于处理新出现的类别,并为密度图预测提供区分性类别引导。类别无关掩码模块与图像特征提取器的联合输出被用于预测最终密度图。当新类别出现时,我们首先在类别增量模块的最后一个回归层和分类层中添加新的神经节点。随后,我们并未从头重新训练模型,而是利用知识蒸馏帮助模型保留已学过的先前物体类别知识。同时,我们采用支撑样本库存储每类少量典型训练样本,以防止模型遗忘旧数据的关键信息。通过这一设计,我们的模型无需大规模重训即可高效、有效地适应新类别,同时保持对已见数据的良好性能。在收集数据集上的大量实验证明了该方法的优越性能。