The goal of this paper is to investigate the complexity of gradient algorithms when learning sparse functions (juntas). We introduce a type of Statistical Queries ($\mathsf{SQ}$), which we call Differentiable Learning Queries ($\mathsf{DLQ}$), to model gradient queries on a specified loss with respect to an arbitrary model. We provide a tight characterization of the query complexity of $\mathsf{DLQ}$ for learning the support of a sparse function over generic product distributions. This complexity crucially depends on the loss function. For the squared loss, $\mathsf{DLQ}$ matches the complexity of Correlation Statistical Queries $(\mathsf{CSQ})$--potentially much worse than $\mathsf{SQ}$. But for other simple loss functions, including the $\ell_1$ loss, $\mathsf{DLQ}$ always achieves the same complexity as $\mathsf{SQ}$. We also provide evidence that $\mathsf{DLQ}$ can indeed capture learning with (stochastic) gradient descent by showing it correctly describes the complexity of learning with a two-layer neural network in the mean field regime and linear scaling.
翻译:本文旨在研究梯度算法在学习稀疏函数(juntas)时的复杂性。我们引入一种称为可微分学习查询($\mathsf{DLQ}$)的统计查询($\mathsf{SQ}$)类型,用于建模针对任意模型在指定损失函数上的梯度查询。我们严格刻画了在一般乘积分布上学习稀疏函数支撑集时$\mathsf{DLQ}$的查询复杂度,该复杂度关键取决于损失函数的选择。对于平方损失,$\mathsf{DLQ}$与相关性统计查询($\mathsf{CSQ}$)的复杂度相当——可能显著劣于$\mathsf{SQ}$。但对于其他简单损失函数(包括$\ell_1$损失),$\mathsf{DLQ}$始终能达到与$\mathsf{SQ}$相同的复杂度。我们还通过证明$\mathsf{DLQ}$能准确描述平均场机制与线性缩放条件下两层神经网络的学习复杂度,为$\mathsf{DLQ}$确实能刻画(随机)梯度下降学习过程提供了证据。