Positive unlabeled learning is a binary classification problem with positive and unlabeled data. It is common in domains where negative labels are costly or impossible to obtain, e.g., medicine and personalized advertising. Most approaches to positive unlabeled learning apply to specific data types (e.g., images, categorical data) and can not generate new positive and negative samples. This work introduces a feature-space distance-based tensor network approach to the positive unlabeled learning problem. The presented method is not domain specific and significantly improves the state-of-the-art results on the MNIST image and 15 categorical/mixed datasets. The trained tensor network model is also a generative model and enables the generation of new positive and negative instances.
翻译:正无标记学习是一类仅包含正样本和无标记样本的二元分类问题,常见于医学诊断、个性化广告等难以或无法获取负标签的领域。现有方法大多针对特定数据类型(如图像、类别型数据)设计,且无法生成新的正负样本。本文提出一种基于特征空间距离的张量网络方法来解决正无标记学习问题。该方法不局限于特定领域,在MNIST图像数据集和15个类别/混合数据集上显著提升了现有最佳结果。训练后的张量网络模型兼具生成能力,可生成新的正负样本实例。