Random Matrix Analysis to Balance between Supervised and Unsupervised Learning under the Low Density Separation Assumption

We propose a theoretical framework to analyze semi-supervised classification under the low density separation assumption in a high-dimensional regime. In particular, we introduce QLDS, a linear classification model, where the low density separation assumption is implemented via quadratic margin maximization. The algorithm has an explicit solution with rich theoretical properties, and we show that particular cases of our algorithm are the least-square support vector machine in the supervised case, the spectral clustering in the fully unsupervised regime, and a class of semi-supervised graph-based approaches. As such, QLDS establishes a smooth bridge between these supervised and unsupervised learning methods. Using recent advances in the random matrix theory, we formally derive a theoretical evaluation of the classification error in the asymptotic regime. As an application, we derive a hyperparameter selection policy that finds the best balance between the supervised and the unsupervised terms of our learning criterion. Finally, we provide extensive illustrations of our framework, as well as an experimental study on several benchmarks to demonstrate that QLDS, while being computationally more efficient, improves over cross-validation for hyperparameter selection, indicating a high promise of the usage of random matrix theory for semi-supervised model selection.

翻译：我们提出一个理论框架，用于分析高维场景中低密度分离假设下的半监督分类问题。具体而言，我们引入QLDS线性分类模型，通过二次间隔最大化实现低密度分离假设。该算法具有显式解和丰富的理论性质，我们证明其特例分别对应监督场景下的最小二乘支持向量机、完全无监督场景下的谱聚类以及一类半监督图方法。因此，QLDS在这些监督与无监督学习方法之间建立了平滑桥梁。利用随机矩阵理论的最新进展，我们形式化推导了渐近场景下分类误差的理论评估。作为应用，我们推导出超参数选择策略，该策略能在学习准则的监督项与无监督项之间找到最佳平衡点。最后，我们通过大量框架示例及多个基准数据集上的实验研究表明：QLDS在计算效率更高的同时，在超参数选择方面优于交叉验证，这预示着随机矩阵理论在半监督模型选择中的巨大潜力。

相关内容

矩阵论

关注 6

随着科学技术的迅速发展，古典的线性代数知识已不能满足现代科技的需要，矩阵的理论和方法业已成为现代科技领域必不可少的工具。诸如数值分析、优化理论、微分方程、概率统计、控制论、力学、电子学、网络等学科领域都与矩阵理论有着密切的联系，甚至在经济管理、金融、保险、社会科学等领域，矩阵理论和方法也有着十分重要的应用。当今电子计算机及计算技术的迅速发展为矩阵理论的应用开辟了更广阔的前景。因此，学习和掌握矩阵的基本理论和方法，对于工科研究生来说是必不可少的。全国的工科院校已普遍把“矩阵论”作为研究生的必修课。

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日