Active learning (AL) aims to reduce annotation costs while maximizing model performance by iteratively selecting valuable instances. While foundation models have made it easier to identify these instances, existing selection strategies still lack robustness across different models, annotation budgets, and datasets. To highlight the potential weaknesses of existing AL strategies and provide a reference point for research, we explore oracle strategies, i.e., strategies that approximate the optimal selection by accessing ground-truth information unavailable in practical AL scenarios. Current oracle strategies, however, fail to scale effectively to large datasets and complex deep neural networks. To tackle these limitations, we introduce the Best-of-Strategy Selector (BoSS), a scalable oracle strategy designed for large-scale AL scenarios. BoSS constructs a set of candidate batches through an ensemble of selection strategies and then selects the batch yielding the highest performance gain. As an ensemble of selection strategies, BoSS can be easily extended with new state-of-the-art strategies as they emerge, ensuring it remains a reliable oracle strategy in the future. Our evaluation demonstrates that i) BoSS outperforms existing oracle strategies, ii) state-of-the-art AL strategies still fall noticeably short of oracle performance, especially in large-scale datasets with many classes, and iii) one possible solution to counteract the inconsistent performance of AL strategies might be to employ an ensemble-based approach for the selection.
翻译:主动学习(AL)旨在通过迭代选择有价值的样本来减少标注成本,同时最大化模型性能。尽管基础模型使得识别这些样本变得更加容易,但现有的选择策略在不同模型、标注预算和数据集之间仍缺乏鲁棒性。为了揭示现有AL策略的潜在弱点并为研究提供参考基准,我们探索了预言机策略,即通过访问实际AL场景中无法获得的真实标注信息来逼近最优选择的策略。然而,当前的预言机策略难以有效扩展到大型数据集和复杂的深度神经网络。为应对这些局限性,我们提出了最佳策略选择器(BoSS),一种专为大规模AL场景设计的可扩展预言机策略。BoSS通过集成多种选择策略构建候选批次集合,然后选择能带来最高性能增益的批次。作为一种选择策略的集成方法,BoSS能够轻松纳入新出现的最先进策略,从而确保其未来仍能作为可靠的预言机策略。我们的评估表明:i) BoSS优于现有预言机策略;ii) 最先进的AL策略与预言机性能仍存在显著差距,尤其在类别繁多的大规模数据集中;iii) 应对AL策略性能不稳定的一个可行解决方案可能是采用基于集成的方法进行样本选择。