This paper investigates the impact of multiscale data on machine learning algorithms, particularly in the context of deep learning. A dataset is multiscale if its distribution shows large variations in scale across different directions. This paper reveals multiscale structures in the loss landscape, including its gradients and Hessians inherited from the data. Correspondingly, it introduces a novel gradient descent approach, drawing inspiration from multiscale algorithms used in scientific computing. This approach seeks to transcend empirical learning rate selection, offering a more systematic, data-informed strategy to enhance training efficiency, especially in the later stages.
翻译:本文研究了多尺度数据对机器学习算法的影响,尤其是在深度学习领域的应用。若数据分布在不同方向上呈现显著尺度变化,则该数据集具有多尺度特性。本文揭示了损失景观中由数据继承而来的多尺度结构(包括其梯度与海森矩阵)。相应地,本文提出了一种新颖的梯度下降方法,该方法借鉴了科学计算中使用的多尺度算法,旨在超越经验性的学习率选择,提供一种更系统、基于数据信息的策略,以提升训练效率,尤其在训练的后期阶段。