Curating an informative and representative dataset is essential for enhancing the performance of 2D object detectors. We present a novel active learning sampling strategy that addresses both the informativeness and diversity of the selections. Our strategy integrates uncertainty and diversity-based selection principles into a joint selection objective by measuring the collective information score of the selected samples. Specifically, our proposed NORIS algorithm quantifies the impact of training with a sample on the informativeness of other similar samples. By exclusively selecting samples that are simultaneously informative and distant from other highly informative samples, we effectively avoid redundancy while maintaining a high level of informativeness. Moreover, instead of utilizing whole image features to calculate distances between samples, we leverage features extracted from detected object regions within images to define object features. This allows us to construct a dataset encompassing diverse object types, shapes, and angles. Extensive experiments on object detection and image classification tasks demonstrate the effectiveness of our strategy over the state-of-the-art baselines. Specifically, our selection strategy achieves a 20% and 30% reduction in labeling costs compared to random selection for PASCAL-VOC and KITTI, respectively.
翻译:构建信息丰富且具代表性的数据集对于提升二维目标检测器性能至关重要。本文提出一种新型主动学习采样策略,同时兼顾所选样本的信息性与多样性。该策略通过测量所选样本的集体信息得分,将基于不确定性和多样性的选择原则整合为联合选择目标。具体而言,我们提出的NORIS算法量化了使用某个样本进行训练对其他相似样本信息性的影响。通过仅选择那些同时具有高信息性且与其他高信息性样本距离较远的样本,我们有效避免了冗余并保持了高水平的信息性。此外,不同于使用整幅图像特征计算样本间距离,我们利用图像中检测到的目标区域特征来定义目标特征,从而构建包含多样化目标类型、形状和角度的数据集。在目标检测和图像分类任务上的大量实验表明,该策略优于现有最先进基线方法。具体而言,在PASCAL-VOC和KITTI数据集上,本选择策略相较于随机选择分别减少了20%和30%的标注成本。