Learning Multi-Index Models with Neural Networks via Mean-Field Langevin Dynamics

We study the problem of learning multi-index models in high-dimensions using a two-layer neural network trained with the mean-field Langevin algorithm. Under mild distributional assumptions on the data, we characterize the effective dimension $d_{\mathrm{eff}}$ that controls both sample and computational complexity by utilizing the adaptivity of neural networks to latent low-dimensional structures. When the data exhibit such a structure, $d_{\mathrm{eff}}$ can be significantly smaller than the ambient dimension. We prove that the sample complexity grows almost linearly with $d_{\mathrm{eff}}$, bypassing the limitations of the information and generative exponents that appeared in recent analyses of gradient-based feature learning. On the other hand, the computational complexity may inevitably grow exponentially with $d_{\mathrm{eff}}$ in the worst-case scenario. Motivated by improving computational complexity, we take the first steps towards polynomial time convergence of the mean-field Langevin algorithm by investigating a setting where the weights are constrained to be on a compact manifold with positive Ricci curvature, such as the hypersphere. There, we study assumptions under which polynomial time convergence is achievable, whereas similar assumptions in the Euclidean setting lead to exponential time complexity.

翻译：我们研究了在高维环境下使用均值场朗之万算法训练的双层神经网络学习多指标模型的问题。在数据满足温和分布假设的条件下，我们通过利用神经网络对潜在低维结构的自适应性，刻画了同时控制样本复杂度与计算复杂度的有效维度 $d_{\mathrm{eff}}$。当数据呈现此类结构时，$d_{\mathrm{eff}}$ 可显著小于环境维度。我们证明了样本复杂度几乎随 $d_{\mathrm{eff}}$ 线性增长，从而规避了近期基于梯度的特征学习分析中出现的信息指数与生成指数的限制。另一方面，在最坏情况下，计算复杂度可能不可避免地随 $d_{\mathrm{eff}}$ 呈指数增长。为改善计算复杂度，我们通过研究权重约束在具有正里奇曲率的紧致流形（如超球面）上的设定，首次探索了均值场朗之万算法多项式时间收敛的可能性。在此设定下，我们研究了可实现多项式时间收敛的假设条件，而相同假设在欧几里得设定中会导致指数级时间复杂度。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日