We revisit the well-studied problem of learning a linear combination of $k$ ReLU activations given labeled examples drawn from the standard $d$-dimensional Gaussian measure. Chen et al. [CDG+23] recently gave the first algorithm for this problem to run in $\text{poly}(d,1/\varepsilon)$ time when $k = O(1)$, where $\varepsilon$ is the target error. More precisely, their algorithm runs in time $(d/\varepsilon)^{\mathrm{quasipoly}(k)}$ and learns over multiple stages. Here we show that a much simpler one-stage version of their algorithm suffices, and moreover its runtime is only $(d/\varepsilon)^{O(k^2)}$.
翻译:我们重新审视了在从标准$d$维高斯测度抽取的带标签样本中学习$k$个ReLU激活函数线性组合这一经过充分研究的问题。Chen等人[CDG+23]近期给出了该问题的首个算法,当$k = O(1)$时能在$\text{poly}(d,1/\varepsilon)$时间内运行,其中$\varepsilon$为目标误差。更准确地说,他们的算法运行时间为$(d/\varepsilon)^{\mathrm{quasipoly}(k)}$,且需分多个阶段学习。本文表明,其算法的一个更简单的单阶段版本就已足够,且运行时间仅为$(d/\varepsilon)^{O(k^2)}$。