The objective of Active Learning is to strategically label a subset of the dataset to maximize performance within a predetermined labeling budget. In this study, we harness features acquired through self-supervised learning. We introduce a straightforward yet potent metric, Cluster Distance Difference, to identify diverse data. Subsequently, we introduce a novel framework, Balancing Active Learning (BAL), which constructs adaptive sub-pools to balance diverse and uncertain data. Our approach outperforms all established active learning methods on widely recognized benchmarks by 1.20%. Moreover, we assess the efficacy of our proposed framework under extended settings, encompassing both larger and smaller labeling budgets. Experimental results demonstrate that, when labeling 80% of the samples, the performance of the current SOTA method declines by 0.74%, whereas our proposed BAL achieves performance comparable to the full dataset. Codes are available at https://github.com/JulietLJY/BAL.
翻译:主动学习的目标是策略性地标注数据集的一个子集,在预定的标注预算内最大化性能。在本研究中,我们利用通过自监督学习获取的特征。我们引入了一种简单而有效的指标——聚类距离差,以识别多样性的数据。随后,我们提出了一种新的框架——平衡主动学习(BAL),该框架构建自适应子池以平衡多样性和不确定性数据。我们的方法在广泛认可的基准测试上比所有已建立的主动学习方法高出1.20%。此外,我们还评估了所提框架在扩展设置下的有效性,包括更大和更小的标注预算。实验结果表明,在标注80%的样本时,当前SOTA方法的性能下降了0.74%,而我们提出的BAL实现了与完整数据集相当的性能。代码可在https://github.com/JulietLJY/BAL获取。