In settings where only a budgeted amount of labeled data can be afforded, active learning seeks to devise query strategies for selecting the most informative data points to be labeled, aiming to enhance learning algorithms' efficiency and performance. Numerous such query strategies have been proposed and compared in the active learning literature. However, the community still lacks standardized benchmarks for comparing the performance of different query strategies. This particularly holds for the combination of query strategies with different learning algorithms into active learning pipelines and examining the impact of the learning algorithm choice. To close this gap, we propose ALPBench, which facilitates the specification, execution, and performance monitoring of active learning pipelines. It has built-in measures to ensure evaluations are done reproducibly, saving exact dataset splits and hyperparameter settings of used algorithms. In total, ALPBench consists of 86 real-world tabular classification datasets and 5 active learning settings, yielding 430 active learning problems. To demonstrate its usefulness and broad compatibility with various learning algorithms and query strategies, we conduct an exemplary study evaluating 9 query strategies paired with 8 learning algorithms in 2 different settings. We provide ALPBench here: https://github.com/ValentinMargraf/ActiveLearningPipelines.
翻译:在仅能负担有限标注数据的场景下,主动学习致力于设计查询策略以选择最具信息价值的数据点进行标注,旨在提升学习算法的效率与性能。主动学习领域已提出并比较了大量此类查询策略。然而,学界仍缺乏用于比较不同查询策略性能的标准化基准,尤其是在将查询策略与不同学习算法结合为主动学习流程、并考察学习算法选择的影响方面。为填补这一空白,我们提出了ALPBench,该框架支持主动学习流程的规范定义、执行与性能监控。其内置的评估机制可确保实验的可复现性,完整保存数据集划分及所用算法的超参数设置。ALPBench共包含86个真实世界表格分类数据集与5种主动学习场景,构成了430个主动学习任务。为验证其实用性及对各类学习算法与查询策略的广泛兼容性,我们开展了一项示例研究,在两种不同场景下评估了9种查询策略与8种学习算法的组合效果。ALPBench开源地址:https://github.com/ValentinMargraf/ActiveLearningPipelines。