We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $\mathbb{R}^d$ with respect to the square loss. Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/\epsilon)^{O(k)}$, where $\epsilon>0$ is the target accuracy. Prior work had given an algorithm for this problem with complexity $(dk/\epsilon)^{h(k)}$, where the function $h(k)$ scales super-polynomially in $k$. Interestingly, the complexity of our algorithm is near-optimal within the class of Correlational Statistical Query algorithms. At a high-level, our algorithm uses tensor decomposition to identify a subspace such that all the $O(k)$-order moments are small in the orthogonal directions. Its analysis makes essential use of the theory of Schur polynomials to show that the higher-moment error tensors are small given that the lower-order ones are.
翻译:我们研究在标准高斯分布$\mathbb{R}^d$下,针对平方损失函数PAC学习$k$个ReLU激活函数线性组合的问题。主要结果是针对该学习任务提出一种高效算法,其样本与计算复杂度为$(dk/\epsilon)^{O(k)}$,其中$\epsilon>0$为目标精度。此前该问题的算法复杂度为$(dk/\epsilon)^{h(k)}$,其中函数$h(k)$关于$k$呈超多项式增长。值得注意的是,我们的算法复杂度在相关统计查询算法类中近乎最优。在高层次上,该算法利用张量分解识别出一个子空间,使得所有$O(k)$阶矩在正交方向上保持极小。其分析本质依赖于Schur多项式理论,证明在低阶矩误差较小的前提下,高阶矩误差张量同样可被控制。