We propose a theoretical framework to analyze semi-supervised classification under the low density separation assumption in a high-dimensional regime. In particular, we introduce QLDS, a linear classification model, where the low density separation assumption is implemented via quadratic margin maximization. The algorithm has an explicit solution with rich theoretical properties, and we show that particular cases of our algorithm are the least-square support vector machine in the supervised case, the spectral clustering in the fully unsupervised regime, and a class of semi-supervised graph-based approaches. As such, QLDS establishes a smooth bridge between these supervised and unsupervised learning methods. Using recent advances in the random matrix theory, we formally derive a theoretical evaluation of the classification error in the asymptotic regime. As an application, we derive a hyperparameter selection policy that finds the best balance between the supervised and the unsupervised terms of our learning criterion. Finally, we provide extensive illustrations of our framework, as well as an experimental study on several benchmarks to demonstrate that QLDS, while being computationally more efficient, improves over cross-validation for hyperparameter selection, indicating a high promise of the usage of random matrix theory for semi-supervised model selection.
翻译:我们提出一个理论框架,用于分析高维场景中低密度分离假设下的半监督分类问题。具体而言,我们引入QLDS线性分类模型,通过二次间隔最大化实现低密度分离假设。该算法具有显式解和丰富的理论性质,我们证明其特例分别对应监督场景下的最小二乘支持向量机、完全无监督场景下的谱聚类以及一类半监督图方法。因此,QLDS在这些监督与无监督学习方法之间建立了平滑桥梁。利用随机矩阵理论的最新进展,我们形式化推导了渐近场景下分类误差的理论评估。作为应用,我们推导出超参数选择策略,该策略能在学习准则的监督项与无监督项之间找到最佳平衡点。最后,我们通过大量框架示例及多个基准数据集上的实验研究表明:QLDS在计算效率更高的同时,在超参数选择方面优于交叉验证,这预示着随机矩阵理论在半监督模型选择中的巨大潜力。