We study the computational and sample complexity of learning a target function $f_*:\mathbb{R}^d\to\mathbb{R}$ with additive structure, that is, $f_*(x) = \frac{1}{\sqrt{M}}\sum_{m=1}^M f_m(\langle x, v_m\rangle)$, where $f_1,f_2,...,f_M:\mathbb{R}\to\mathbb{R}$ are nonlinear link functions of single-index models (ridge functions) with diverse and near-orthogonal index features $\{v_m\}_{m=1}^M$, and the number of additive tasks $M$ grows with the dimensionality $M\asymp d^\gamma$ for $\gamma\ge 0$. This problem setting is motivated by the classical additive model literature, the recent representation learning theory of two-layer neural network, and large-scale pretraining where the model simultaneously acquires a large number of "skills" that are often localized in distinct parts of the trained network. We prove that a large subset of polynomial $f_*$ can be efficiently learned by gradient descent training of a two-layer neural network, with a polynomial statistical and computational complexity that depends on the number of tasks $M$ and the information exponent of $f_m$, despite the unknown link function and $M$ growing with the dimensionality. We complement this learnability guarantee with computational hardness result by establishing statistical query (SQ) lower bounds for both the correlational SQ and full SQ algorithms.
翻译:我们研究了具有加性结构的目标函数 $f_*:\mathbb{R}^d\to\mathbb{R}$ 的学习计算复杂度与样本复杂度,即 $f_*(x) = \frac{1}{\sqrt{M}}\sum_{m=1}^M f_m(\langle x, v_m\rangle)$,其中 $f_1,f_2,...,f_M:\mathbb{R}\to\mathbb{R}$ 为单索引模型(脊函数)的非线性链接函数,其索引特征 $\{v_m\}_{m=1}^M$ 具有多样性且接近正交,且加性任务数量 $M$ 随维度增长满足 $M\asymp d^\gamma$($\gamma\ge 0$)。该问题设定受到经典加性模型文献、近期双层神经网络表示学习理论以及大规模预训练的启发——在预训练中模型同时习得大量通常位于训练网络不同区域的“技能”。我们证明,尽管链接函数未知且 $M$ 随维度增长,大部分多项式 $f_*$ 仍可通过双层神经网络的梯度下降训练高效学习,其统计与计算复杂度为多项式级,且依赖于任务数量 $M$ 与 $f_m$ 的信息指数。我们通过建立相关统计查询与完全统计查询算法的统计查询下界,以计算难度结果补充了该可学习性保证。