Active learning (AL) strategies aim to train high-performance models with minimal labeling efforts, only selecting the most informative instances for annotation. Current approaches to evaluating data informativeness predominantly focus on the data's distribution or intrinsic information content and do not directly correlate with downstream task performance, such as mean average precision (mAP) in object detection. Thus, we propose Performance-guided (i.e. mAP-guided) Reinforced Active Learning for Object Detection (MGRAL), a novel approach that leverages the concept of expected model output changes as informativeness. To address the combinatorial explosion challenge of batch sample selection and the non-differentiable correlation between model performance and selected batches, MGRAL skillfully employs a reinforcement learning-based sampling agent that optimizes selection using policy gradient with mAP improvement as reward. Moreover, to reduce the computational overhead of mAP estimation with unlabeled samples, MGRAL utilizes an unsupervised way with fast look-up tables, ensuring feasible deployment. We evaluate MGRAL's active learning performance on detection tasks over PASCAL VOC and COCO benchmarks. Our approach demonstrates the highest AL curve with convincing visualizations, establishing a new paradigm in reinforcement learning-driven active object detection.
翻译:主动学习(AL)策略旨在以最少的标注工作量训练高性能模型,仅选择信息量最大的样本进行标注。当前评估数据信息量的方法主要关注数据的分布或内在信息内容,并未直接关联下游任务性能,例如目标检测中的平均精度均值(mAP)。为此,我们提出了一种性能导向(即mAP导向)的强化主动学习目标检测方法(MGRAL),该方法创新性地将模型预期输出变化作为信息量度量标准。针对批量样本选择面临的组合爆炸问题,以及模型性能与选定批次之间不可微分的关联性,MGRAL巧妙地采用基于强化学习的采样代理器,通过以mAP提升作为奖励的策略梯度优化选择过程。此外,为降低未标注样本mAP估计的计算开销,MGRAL采用基于快速查找表的无监督实现方式,确保方案可实际部署。我们在PASCAL VOC和COCO基准数据集上评估了MGRAL在检测任务中的主动学习性能。该方法展现出最优的AL曲线与具有说服力的可视化结果,为强化学习驱动的主动目标检测建立了新范式。