Labeled data are critical to modern machine learning applications, but obtaining labels can be expensive. To mitigate this cost, machine learning methods, such as transfer learning, semi-supervised learning and active learning, aim to be label-efficient: achieving high predictive performance from relatively few labeled examples. While obtaining the best label-efficiency in practice often requires combinations of these techniques, existing benchmark and evaluation frameworks do not capture a concerted combination of all such techniques. This paper addresses this deficiency by introducing LabelBench, a new computationally-efficient framework for joint evaluation of multiple label-efficient learning techniques. As an application of LabelBench, we introduce a novel benchmark of state-of-the-art active learning methods in combination with semi-supervised learning for fine-tuning pretrained vision transformers. Our benchmark demonstrates better label-efficiencies than previously reported in active learning. LabelBench's modular codebase is open-sourced for the broader community to contribute label-efficient learning methods and benchmarks. The repository can be found at: https://github.com/EfficientTraining/LabelBench.
翻译:标注数据对现代机器学习应用至关重要,但获取标签可能成本高昂。为降低这一成本,迁移学习、半监督学习和主动学习等机器学习方法旨在实现标签高效性:即通过相对较少的标注样本获得高预测性能。尽管在实践中实现最佳标签高效性往往需要结合多种技术,但现有的基准测试和评估框架并未涵盖所有这些技术的协同组合。本文通过引入LabelBench——一种用于联合评估多种标签高效学习技术的计算高效新框架,弥补了这一不足。作为LabelBench的应用,我们提出了一种新的基准测试,用于评估最先进的主动学习方法与半监督学习相结合以微调预训练视觉Transformer的效果。我们的基准测试显示,其标签高效性优于先前主动学习研究中的结果。LabelBench的模块化代码库已开源,供更广泛的社区贡献标签高效学习方法及基准测试。相关仓库可访问:https://github.com/EfficientTraining/LabelBench。