Labeled data are critical to modern machine learning applications, but obtaining labels can be expensive. To mitigate this cost, machine learning methods, such as transfer learning, semi-supervised learning and active learning, aim to be label-efficient: achieving high predictive performance from relatively few labeled examples. While obtaining the best label-efficiency in practice often requires combinations of these techniques, existing benchmark and evaluation frameworks do not capture a concerted combination of all such techniques. This paper addresses this deficiency by introducing LabelBench, a new computationally-efficient framework for joint evaluation of multiple label-efficient learning techniques. As an application of LabelBench, we introduce a novel benchmark of state-of-the-art active learning methods in combination with semi-supervised learning for fine-tuning pretrained vision transformers. Our benchmark demonstrates better label-efficiencies than previously reported in active learning. LabelBench's modular codebase is open-sourced for the broader community to contribute label-efficient learning methods and benchmarks. The repository can be found at: https://github.com/EfficientTraining/LabelBench.
翻译:标记数据对现代机器学习应用至关重要,但获取标签成本高昂。为降低这一成本,诸如迁移学习、半监督学习和主动学习等机器学习方法旨在实现标签高效性:即通过相对较少的标记样本获得高预测性能。尽管在实际中实现最佳标签高效性通常需要组合运用这些技术,但现有基准测试和评估框架并未涵盖所有此类技术的协同组合。本文通过引入LabelBench这一新型计算高效框架来弥补这一不足,该框架可对多种标签高效学习技术进行联合评估。作为LabelBench的应用实例,我们提出了一个新颖基准测试,针对结合半监督学习的多种最先进主动学习方法用于微调预训练视觉Transformer。该基准测试展示了优于此前主动学习领域报告的标签高效性。LabelBench的开源模块化代码库可供更广泛的社区贡献标签高效学习方法与基准测试,代码库地址为:https://github.com/EfficientTraining/LabelBench。