Laplace learning is a popular machine learning algorithm for finding missing labels from a small number of labelled feature vectors using the geometry of a graph. More precisely, Laplace learning is based on minimising a graph-Dirichlet energy, equivalently a discrete Sobolev $\mathrm{H}^1$ semi-norm, constrained to taking the values of known labels on a given subset. The variational problem is asymptotically ill-posed as the number of unlabeled feature vectors goes to infinity for finite given labels due to a lack of regularity in minimisers of the continuum Dirichlet energy in any dimension higher than one. In particular, continuum minimisers are not continuous. One solution is to consider higher-order regularisation, which is the analogue of minimising Sobolev $\mathrm{H}^s$ semi-norms. In this paper we consider the asymptotics of minimising a graph variant of the Sobolev $\mathrm{H}^s$ semi-norm with pointwise constraints. We show that, as expected, one needs $s>d/2$ where $d$ is the dimension of the data manifold. We also show that there must be a upper bound on the connectivity of the graph; that is, highly connected graphs lead to degenerate behaviour of the minimiser even when $s>d/2$.
翻译:拉普拉斯学习是一种流行的机器学习算法,通过利用图的几何结构从少量有标签特征向量中寻找缺失标签。更准确地说,拉普拉斯学习基于最小化图-狄利克雷能量(等价于离散索博列夫$\mathrm{H}^1$半范数),并约束在给定子集上取已知标签的值。当有限标签对应的无标签特征向量数量趋于无穷时,该变分问题渐近病态,原因在于任意高于一维的连续狄利克雷能量最小化子缺乏正则性,具体表现为连续最小化子不连续。一种解决方案是采用高阶正则化,即最小化索博列夫$\mathrm{H}^s$半范数的模拟形式。本文考虑在点约束条件下最小化索博列夫$\mathrm{H}^s$半范数的图变体渐近性质。我们证明,如预期所示,需要满足$s>d/2$,其中$d$为数据流形的维度。同时表明,图的连通性必须存在上界,即高度连通的图即使满足$s>d/2$,也会导致最小化子的退化行为。