Semi-supervised learning is highly useful in common scenarios where labeled data is scarce but unlabeled data is abundant. The graph (or nonlocal) Laplacian is a fundamental smoothing operator for solving various learning tasks. For unsupervised clustering, a spectral embedding is often used, based on graph-Laplacian eigenvectors. For semi-supervised problems, the common approach is to solve a constrained optimization problem, regularized by a Dirichlet energy, based on the graph-Laplacian. However, as supervision decreases, Dirichlet optimization becomes suboptimal. We therefore would like to obtain a smooth transition between unsupervised clustering and low-supervised graph-based classification. In this paper, we propose a new type of graph-Laplacian which is adapted for Semi-Supervised Learning (SSL) problems. It is based on both density and contrastive measures and allows the encoding of the labeled data directly in the operator. Thus, we can perform successfully semi-supervised learning using spectral clustering. The benefits of our approach are illustrated for several SSL problems.
翻译:半监督学习在标注数据稀缺但未标注数据丰富的常见场景中非常有用。图(或非局部)拉普拉斯算子是解决各类学习任务的基础平滑算子。对于无监督聚类,常基于图拉普拉斯特征向量使用谱嵌入方法。针对半监督问题,常用方法是通过基于图拉普拉斯算子的狄利克雷能量正则化,求解约束优化问题。然而,随着监督信息的减少,狄利克雷优化会变得次优。因此,我们希望实现无监督聚类与低监督图分类之间的平滑过渡。本文提出了一种适用于半监督学习的新型图拉普拉斯算子。该算子同时基于密度和对比度量,并允许将标注数据直接编码到算子中。由此,我们能够利用谱聚类成功进行半监督学习。通过多个半监督学习问题,验证了该方法的优势。