Using backward error analysis, we compute implicit training biases in multitask and continual learning settings for neural networks trained with stochastic gradient descent. In particular, we derive modified losses that are implicitly minimized during training. They have three terms: the original loss, accounting for convergence, an implicit flatness regularization term proportional to the learning rate, and a last term, the conflict term, which can theoretically be detrimental to both convergence and implicit regularization. In multitask, the conflict term is a well-known quantity, measuring the gradient alignment between the tasks, while in continual learning the conflict term is a new quantity in deep learning optimization, although a basic tool in differential geometry: The Lie bracket between the task gradients.
翻译:采用向后误差分析方法,我们计算了随机梯度下降训练下神经网络在多任务学习与持续学习场景中的隐式训练偏差。具体而言,我们推导出训练过程中隐式最小化的修正损失函数,该函数包含三项:原始损失(控制收敛)、与学习率成正比的隐式平坦性正则项,以及最后一项——冲突项,该项理论上可能对收敛和隐式正则化均产生负面影响。在多任务学习中,冲突项是衡量任务间梯度对齐程度的已知量;而在持续学习中,冲突项是深度学习优化中的新概念,尽管它在微分几何中属于基础工具——即任务梯度之间的李括号。