We propose a meta-learning method for positive and unlabeled (PU) classification, which improves the performance of binary classifiers obtained from only PU data in unseen target tasks. PU learning is an important problem since PU data naturally arise in real-world applications such as outlier detection and information retrieval. Existing PU learning methods require many PU data, but sufficient data are often unavailable in practice. The proposed method minimizes the test classification risk after the model is adapted to PU data by using related tasks that consist of positive, negative, and unlabeled data. We formulate the adaptation as an estimation problem of the Bayes optimal classifier, which is an optimal classifier to minimize the classification risk. The proposed method embeds each instance into a task-specific space using neural networks. With the embedded PU data, the Bayes optimal classifier is estimated through density-ratio estimation of PU densities, whose solution is obtained as a closed-form solution. The closed-form solution enables us to efficiently and effectively minimize the test classification risk. We empirically show that the proposed method outperforms existing methods with one synthetic and three real-world datasets.
翻译:本文提出一种面向正例与未标记(PU)分类的元学习方法,该方法能够提升仅从PU数据中获得的二分类器在未知目标任务中的性能。PU学习是一个重要问题,因为PU数据在异常检测和信息检索等实际应用中自然存在。现有PU学习方法需要大量PU数据,但实践中常面临数据不足的困境。本方法通过利用包含正例、负例和未标记数据的相关任务,在模型适应PU数据后最小化测试分类风险。我们将该适应过程建模为贝叶斯最优分类器的估计问题——该分类器是能够最小化分类风险的最优分类器。本方法使用神经网络将每个实例嵌入到任务特定空间中。基于嵌入后的PU数据,通过PU密度比估计来推导贝叶斯最优分类器,其解以闭式解形式获得。该闭式解使我们能够高效且有效地最小化测试分类风险。我们通过一个合成数据集和三个真实世界数据集的实验证明,所提方法在性能上优于现有方法。