The great success neural networks have achieved is inseparable from the application of gradient-descent (GD) algorithms. Based on GD, many variant algorithms have emerged to improve the GD optimization process. The gradient for back-propagation is apparently the most crucial aspect for the training of a neural network. The quality of the calculated gradient can be affected by multiple aspects, e.g., noisy data, calculation error, algorithm limitation, and so on. To reveal gradient information beyond gradient descent, we introduce a framework (\textbf{GCGD}) to perform gradient correction. GCGD consists of two plug-in modules: 1) inspired by the idea of gradient prediction, we propose a \textbf{GC-W} module for weight gradient correction; 2) based on Neural ODE, we propose a \textbf{GC-ODE} module for hidden states gradient correction. Experiment results show that our gradient correction framework can effectively improve the gradient quality to reduce training epochs by $\sim$ 20\% and also improve the network performance.
翻译:神经网络取得的巨大成功离不开梯度下降(GD)算法的应用。基于GD,已涌现出许多变体算法来改进GD优化过程。反向传播的梯度显然是神经网络训练中最关键的环节。计算梯度的质量可能受到多个方面的影响,例如噪声数据、计算误差、算法限制等。为了揭示梯度下降之外的梯度信息,我们引入了一个框架(\textbf{GCGD})来进行梯度校正。GCGD包含两个即插即用模块:1)受梯度预测思想的启发,我们提出了\textbf{GC-W}模块用于权重梯度校正;2)基于神经常微分方程(Neural ODE),我们提出了\textbf{GC-ODE}模块用于隐藏状态梯度校正。实验结果表明,我们的梯度校正框架能够有效提升梯度质量,从而减少约20%的训练轮次,并同时提升网络性能。