The present study aims to investigate a cluster cleaning algorithm that is both computationally simple and capable of solving the PU classification when the SCAR condition is unsatisfied. A secondary objective of this study is to determine the robustness of the LassoJoint method to perturbations of the SCAR condition. In the first step of our algorithm, we obtain cleaning labels from 2-means clustering. Subsequently, we perform logistic regression on the cleaned data, assigning positive labels from the cleaning algorithm with additional true positive observations. The remaining observations are assigned the negative label. The proposed algorithm is evaluated by comparing 11 real data sets from machine learning repositories and a synthetic set. The findings obtained from this study demonstrate the efficacy of the clustering algorithm in scenarios where the SCAR condition is violated and further underscore the moderate robustness of the LassoJoint algorithm in this context.
翻译:本研究旨在探索一种计算简便且能在不满足SCAR条件下解决PU分类问题的聚类清洗算法。第二个目标是评估LassoJoint方法对SCAR条件扰动的鲁棒性。算法第一步通过2均值聚类获取清洗标签,随后对清洗后的数据执行逻辑回归,为清洗算法中的正标签分配额外真实正样本观测值,剩余观测值则标记为负标签。通过对比机器学习库中的11个真实数据集与一个合成集,对所提算法进行了评估。研究结果表明,该聚类算法在违反SCAR条件的场景下具有有效性,并进一步凸显了LassoJoint算法在此背景下的中等鲁棒性。