Wide neural networks are biased towards learning certain functions, influencing both the rate of convergence of gradient descent (GD) and the functions that are reachable with GD in finite training time. As such, there is a great need for methods that can modify this bias according to the task at hand. To that end, we introduce Modified Spectrum Kernels (MSKs), a novel family of constructed kernels that can be used to approximate kernels with desired eigenvalues for which no closed form is known. We leverage the duality between wide neural networks and Neural Tangent Kernels and propose a preconditioned gradient descent method, which alters the trajectory of GD. As a result, this allows for a polynomial and, in some cases, exponential training speedup without changing the final solution. Our method is both computationally efficient and simple to implement.
翻译:宽神经网络在梯度下降(GD)收敛速率以及有限训练时间内GD可达到的函数集合方面,均存在对学习特定函数类型的固有偏置。因此,亟需开发能够根据具体任务调整这种偏置的方法。为此,我们提出修正谱核(MSK)这一新型构造核函数族,可用于近似具有目标特征值但未知闭式表达式的核函数。基于宽神经网络与神经正切核(Neural Tangent Kernel)的对偶关系,我们提出预处理梯度下降方法,通过改变GD的训练轨迹,从而在保持最终解不变的前提下实现多项式级(某些情况下为指数级)的训练加速。该方法兼具计算高效性与实现简易性。