In constrained real-world scenarios, where it may be challenging or costly to generate data, disciplined methods for acquiring informative new data points are of fundamental importance for the efficient training of machine learning (ML) models. Active learning (AL) is a sub-field of ML focused on the development of methods to iteratively and economically acquire data through strategically querying new data points that are the most useful for a particular task. Here, we introduce PyRelationAL, an open source library for AL research. We describe a modular toolkit that is compatible with diverse ML frameworks (e.g. PyTorch, scikit-learn, TensorFlow, JAX). Furthermore, the library implements a wide range of published methods and provides API access to wide-ranging benchmark datasets and AL task configurations based on existing literature. The library is supplemented by an expansive set of tutorials, demos, and documentation to help users get started. PyRelationAL is maintained using modern software engineering practices -- with an inclusive contributor code of conduct -- to promote long term library quality and utilisation. PyRelationAL is available under a permissive Apache licence on PyPi and at https://github.com/RelationRx/pyrelational.
翻译:在现实世界中受约束的场景下,当数据生成可能具有挑战性且成本高昂时,获取信息性新数据点的严谨方法对于机器学习模型的高效训练至关重要。主动学习是机器学习的一个子领域,专注于通过策略性地查询对特定任务最有益的新数据点,以迭代且经济的方式获取数据的方法开发。本文介绍PyRelationAL,一个用于主动学习研究的开源库。我们描述了一个模块化工具包,兼容多种机器学习框架(例如PyTorch、scikit-learn、TensorFlow、JAX)。此外,该库实现了大量已发表的方法,并基于现有文献提供对广泛基准数据集和主动学习任务配置的API访问。该库辅以丰富的教程、演示和文档,帮助用户快速上手。PyRelationAL采用现代软件工程实践进行维护(包含包容性的贡献者行为准则),以促进库的长期质量和利用率。PyRelationAL在PyPi及https://github.com/RelationRx/pyrelational上以宽松的Apache许可证提供。