We present small-text, an easy-to-use active learning library written in Python, which offers pool-based active learning for single- and multi-label text classification in Python. It features many pre-implemented state-of-the-art query strategies, including some that leverage the GPU. Standardized interfaces allow the combination of a variety of classifiers, query strategies, and stopping criteria, facilitating a quick mix and match, and enabling a rapid development of both active learning experiments and applications. In order to make various classifiers and query strategies accessible for active learning, small-text integrates several well-known machine learning libraries, namely scikit-learn, PyTorch, and Hugging Face transformers. The latter integrations are optionally installable extensions, so GPUs can be used but are not required. The library is publicly available under the MIT License at https://github.com/webis-de/small-text, in version 1.1.1 at the time of writing.
翻译:我们提出了small-text,一个易于使用的Python主动学习库,它提供基于池的主动学习方法,适用于Python中的单标签和多标签文本分类任务。该库集成了多种预实现的最先进查询策略,其中部分支持GPU加速。标准化接口允许组合多种分类器、查询策略和停止准则,便于快速混合搭配,从而加速主动学习实验与应用开发。为使各类分类器和查询策略可被主动学习直接调用,small-text整合了多个著名机器学习库,包括scikit-learn、PyTorch和Hugging Face transformers。后者为可选安装扩展,因此GPU可用但非必需。该库以MIT许可证公开发布,地址为https://github.com/webis-de/small-text,截至撰写本文时版本为1.1.1。