Deep neural networks employing error back-propagation for learning can suffer from exploding and vanishing gradient problems. Numerous solutions have been proposed such as normalisation techniques or limiting activation functions to linear rectifying units. In this work we follow a different approach which is particularly applicable to closed-loop learning of forward models where back-propagation makes exclusive use of the sign of the error signal to prime the learning, whilst a global relevance signal modulates the rate of learning. This is inspired by the interaction between local plasticity and a global neuromodulation. For example, whilst driving on an empty road, one can allow for slow step-wise optimisation of actions, whereas, at a busy junction, an error must be corrected at once. Hence, the error is the priming signal and the intensity of the experience is a modulating factor in the weight change. The advantages of this Prime and Modulate paradigm is twofold: it is free from normalisation and it makes use of relevant cues from the environment to enrich the learning. We present a mathematical derivation of the learning rule in z-space and demonstrate the real-time performance with a robotic platform. The results show a significant improvement in the speed of convergence compared to that of the conventional back-propagation.
翻译:摘要:采用误差反向传播进行学习的深度神经网络常面临梯度爆炸和梯度消失问题。现有解决方案包括归一化技术或限制激活函数为线性整流单元等。本文另辟蹊径,该方法特别适用于前向模型的闭环学习:反向传播仅使用误差信号的符号来启动学习,而全局相关性信号则调节学习速率。这一设计灵感来源于局部可塑性机制与全局神经调节之间的相互作用。例如,在空旷道路上行驶时,可允许对动作进行缓慢的逐步优化;而在繁忙路口,则需立即修正误差。因此,误差作为启动信号,体验强度则成为权重更新中的调制因子。Prime and Modulate范式的优势体现在两方面:无需归一化处理,且能利用环境中的相关线索增强学习效果。本文在z域推导了该学习规则的数学表达式,并通过机器人平台验证了其实时性能。结果表明,与传统反向传播相比,本方法在收敛速度上具有显著提升。