We study the problem of learning multi-index models in high-dimensions using a two-layer neural network trained with the mean-field Langevin algorithm. Under mild distributional assumptions on the data, we characterize the effective dimension $d_{\mathrm{eff}}$ that controls both sample and computational complexity by utilizing the adaptivity of neural networks to latent low-dimensional structures. When the data exhibit such a structure, $d_{\mathrm{eff}}$ can be significantly smaller than the ambient dimension. We prove that the sample complexity grows almost linearly with $d_{\mathrm{eff}}$, bypassing the limitations of the information and generative exponents that appeared in recent analyses of gradient-based feature learning. On the other hand, the computational complexity may inevitably grow exponentially with $d_{\mathrm{eff}}$ in the worst-case scenario. Motivated by improving computational complexity, we take the first steps towards polynomial time convergence of the mean-field Langevin algorithm by investigating a setting where the weights are constrained to be on a compact manifold with positive Ricci curvature, such as the hypersphere. There, we study assumptions under which polynomial time convergence is achievable, whereas similar assumptions in the Euclidean setting lead to exponential time complexity.
翻译:我们研究了使用均值场朗之万算法训练的两层神经网络在高维环境中学习多指标模型的问题。在对数据施加温和分布假设的条件下,我们通过利用神经网络对潜在低维结构的自适应性,刻画了控制样本复杂度与计算复杂度的有效维度$d_{\mathrm{eff}}$。当数据呈现此类结构时,$d_{\mathrm{eff}}$可显著小于环境维度。我们证明样本复杂度几乎随$d_{\mathrm{eff}}$线性增长,从而规避了近期基于梯度的特征学习分析中出现的信息指数与生成指数的限制。另一方面,在最坏情况下,计算复杂度可能不可避免地随$d_{\mathrm{eff}}$呈指数级增长。为改善计算复杂度,我们通过研究将权重约束在具有正里奇曲率的紧致流形(如超球面)上的设定,首次探索均值场朗之万算法实现多项式时间收敛的可能性。在此设定下,我们研究了可实现多项式时间收敛的假设条件,而相同假设在欧几里得空间中会导致指数级时间复杂度。