We provide a convergence analysis of gradient descent for the problem of agnostically learning a single ReLU function with moderate bias under Gaussian distributions. Unlike prior work that studies the setting of zero bias, we consider the more challenging scenario when the bias of the ReLU function is non-zero. Our main result establishes that starting from random initialization, in a polynomial number of iterations gradient descent outputs, with high probability, a ReLU function that achieves an error that is within a constant factor of the optimal error of the best ReLU function with moderate bias. We also provide finite sample guarantees, and these techniques generalize to a broader class of marginal distributions beyond Gaussians.
翻译:我们针对在高斯分布下不可知学习具有适度偏置的单个ReLU函数问题,提供了梯度下降的收敛性分析。与先前研究零偏置设置的工作不同,我们考虑了当ReLU函数的偏置非零时的更具挑战性场景。我们的主要结果表明:从随机初始化开始,梯度下降在多项式量级的迭代次数内,以高概率输出一个ReLU函数,其误差在适度偏置的最佳ReLU函数的最优误差的常数因子范围内。我们还提供了有限样本保证,这些技术可推广到超越高斯分布的更广泛边缘分布类别。