Multi-index models - functions which only depend on the covariates through a non-linear transformation of their projection on a subspace - are a useful benchmark for investigating feature learning with neural nets. This paper examines the theoretical boundaries of efficient learnability in this hypothesis class, focusing on the minimum sample complexity required for weakly recovering their low-dimensional structure with first-order iterative algorithms, in the high-dimensional regime where the number of samples $n\!=\!\alpha d$ is proportional to the covariate dimension $d$. Our findings unfold in three parts: (i) we identify under which conditions a trivial subspace can be learned with a single step of a first-order algorithm for any $\alpha\!>\!0$; (ii) if the trivial subspace is empty, we provide necessary and sufficient conditions for the existence of an easy subspace where directions that can be learned only above a certain sample complexity $\alpha\!>\!\alpha_c$, where $\alpha_{c}$ marks a computational phase transition. In a limited but interesting set of really hard directions -- akin to the parity problem -- $\alpha_c$ is found to diverge. Finally, (iii) we show that interactions between different directions can result in an intricate hierarchical learning phenomenon, where directions can be learned sequentially when coupled to easier ones. We discuss in detail the grand staircase picture associated to these functions (and contrast it with the original staircase one). Our theory builds on the optimality of approximate message-passing among first-order iterative methods, delineating the fundamental learnability limit across a broad spectrum of algorithms, including neural networks trained with gradient descent, which we discuss in this context.
翻译:多指标模型——即仅通过协变量在其子空间投影的非线性变换而依赖于协变量的函数——是研究神经网络特征学习的有用基准。本文考察了该假设类中高效可学习性的理论边界,重点关注在高维体系下(样本数$n\!=\!\alpha d$与协变量维度$d$成正比时),使用一阶迭代算法弱恢复其低维结构所需的最小样本复杂度。我们的研究结果分为三个部分:(i)我们确定了在何种条件下,对于任意$\alpha\!>\!0$,可以通过一阶算法的单步学习到平凡子空间;(ii)若平凡子空间为空,我们为存在“简单子空间”提供了充分必要条件,其中某些方向仅在样本复杂度$\alpha\!>\!\alpha_c$以上才可被学习,此处$\alpha_{c}$标志着一个计算相变。在一组有限但有趣的真正困难方向(类似于奇偶校验问题)中,我们发现$\alpha_c$会发散。最后,(iii)我们证明了不同方向之间的相互作用可能导致复杂的层级学习现象,即当困难方向与较易方向耦合时,它们可以被顺序学习。我们详细讨论了与这些函数相关的“宏阶梯”图景(并与原始的“阶梯”图景进行了对比)。我们的理论建立在近似消息传递在一阶迭代方法中的最优性之上,从而描绘了包括梯度下降训练的神经网络在内的广泛算法的基本可学习性极限,我们在此背景下对此进行了讨论。