Multi-index models -- functions which only depend on the covariates through a non-linear transformation of their projection on a subspace -- are a useful benchmark for investigating feature learning with neural networks. This paper examines the theoretical boundaries of learnability in this hypothesis class, focusing particularly on the minimum sample complexity required for weakly recovering their low-dimensional structure with first-order iterative algorithms, in the high-dimensional regime where the number of samples is $n=\alpha d$ is proportional to the covariate dimension $d$. Our findings unfold in three parts: (i) first, we identify under which conditions a \textit{trivial subspace} can be learned with a single step of a first-order algorithm for any $\alpha\!>\!0$; (ii) second, in the case where the trivial subspace is empty, we provide necessary and sufficient conditions for the existence of an {\it easy subspace} consisting of directions that can be learned only above a certain sample complexity $\alpha\!>\!\alpha_c$. The critical threshold $\alpha_{c}$ marks the presence of a computational phase transition, in the sense that no efficient iterative algorithm can succeed for $\alpha\!<\!\alpha_c$. In a limited but interesting set of really hard directions -- akin to the parity problem -- $\alpha_c$ is found to diverge. Finally, (iii) we demonstrate that interactions between different directions can result in an intricate hierarchical learning phenomenon, where some directions can be learned sequentially when coupled to easier ones. Our analytical approach is built on the optimality of approximate message-passing algorithms among first-order iterative methods, delineating the fundamental learnability limit across a broad spectrum of algorithms, including neural networks trained with gradient descent.
翻译:多指标模型——即仅通过协变量在其子空间上投影的非线性变换而依赖于协变量的函数——是研究神经网络特征学习的有用基准。本文考察了该假设类中可学习性的理论边界,重点关注在高维体系下(样本数 $n=\alpha d$ 与协变量维度 $d$ 成正比时),使用一阶迭代算法弱恢复其低维结构所需的最小样本复杂度。我们的研究结果分为三个部分:(i) 首先,我们确定了在何种条件下,对于任意 $\alpha\!>\!0$,可以通过一阶算法的单步学习到一个\textit{平凡子空间};(ii) 其次,在平凡子空间为空的情况下,我们给出了存在一个{\it 易学习子空间}(该子空间包含仅在样本复杂度超过某个阈值 $\alpha\!>\!\alpha_c$ 时才能被学习的方向)的充分必要条件。临界阈值 $\alpha_{c}$ 标志着一个计算相变的存在,即当 $\alpha\!<\!\alpha_c$ 时,任何高效的迭代算法都无法成功。在一组有限但有趣的真正困难方向(类似于奇偶校验问题)上,我们发现 $\alpha_c$ 会发散。最后,(iii) 我们证明了不同方向之间的相互作用可能导致一种复杂的层级学习现象,即某些方向在与更易学习的方向耦合时可以被顺序学习。我们的分析方法建立在近似消息传递算法在一阶迭代方法中的最优性之上,从而划定了包括梯度下降训练的神经网络在内的广泛算法类别的基本可学习性极限。