Positive-Unlabelled (PU) learning is a growing field of machine learning that aims to learn classifiers from data consisting of labelled positive and unlabelled instances, which can be in reality positive or negative, but whose label is unknown. An extensive number of methods have been proposed to address PU learning over the last two decades, so many so that selecting an optimal method for a given PU learning task presents a challenge. Our previous work has addressed this by proposing GA-Auto-PU, the first Automated Machine Learning (Auto-ML) system for PU learning. In this work, we propose two new Auto-ML systems for PU learning: BO-Auto-PU, based on a Bayesian Optimisation approach, and EBO-Auto-PU, based on a novel evolutionary/Bayesian optimisation approach. We also present an extensive evaluation of the three Auto-ML systems, comparing them to each other and to well-established PU learning methods across 60 datasets (20 real-world datasets, each with 3 versions in terms of PU learning characteristics).
翻译:正无标记(PU)学习是机器学习中一个快速发展的领域,旨在从包含已标记正例和未标记实例(这些实例实际上可能是正例或负例,但其标签未知)的数据中学习分类器。过去二十年间,研究者提出了大量处理PU学习的方法,以至于针对给定PU学习任务选择最优方法已成为一项挑战。我们先前的工作通过提出GA-Auto-PU——首个面向PU学习的自动化机器学习(Auto-ML)系统——解决了这一问题。在本工作中,我们提出了两种新的PU学习Auto-ML系统:基于贝叶斯优化方法的BO-Auto-PU,以及基于新型进化/贝叶斯优化方法的EBO-Auto-PU。我们还对这三个Auto-ML系统进行了广泛评估,将其相互对比,并与60个数据集(20个真实世界数据集,每个数据集在PU学习特征方面包含3个版本)上成熟的PU学习方法进行比较。