Backpropagation (BP) remains the dominant and most successful method for training parameters of deep neural network models. However, BP relies on two computationally distinct phases, does not provide a satisfactory explanation of biological learning, and can be challenging to apply for training of networks with discontinuities or noisy node dynamics. By comparison, node perturbation (NP) proposes learning by the injection of noise into network activations, and subsequent measurement of the induced loss change. NP relies on two forward (inference) passes, does not make use of network derivatives, and has been proposed as a model for learning in biological systems. However, standard NP is highly data inefficient and unstable due to its unguided noise-based search process. In this work, we investigate different formulations of NP and relate it to the concept of directional derivatives as well as combining it with a decorrelating mechanism for layer-wise inputs. We find that a closer alignment with directional derivatives together with input decorrelation at every layer strongly enhances performance of NP learning with large improvements in parameter convergence and much higher performance on the test data, approaching that of BP. Furthermore, our novel formulation allows for application to noisy systems in which the noise process itself is inaccessible.
翻译:反向传播(BP)仍然是训练深度神经网络模型参数的主导且最成功的方法。然而,BP依赖于两个计算上截然不同的阶段,未能为生物学习提供令人满意的解释,并且在训练具有不连续性或噪声节点动力学的网络时可能面临挑战。相比之下,节点扰动(NP)提出通过向网络激活中注入噪声,并随后测量引起的损失变化来进行学习。NP依赖于两次前向(推理)传播,不利用网络导数,并已被提出作为生物系统中的学习模型。然而,由于标准NP基于无引导的噪声搜索过程,其数据效率极低且不稳定。在本工作中,我们研究了NP的不同形式,并将其与方向导数的概念联系起来,同时将其与层间输入的去相关机制相结合。我们发现,与方向导数更紧密地对齐,并在每一层进行输入去相关,能显著提升NP学习的性能,在参数收敛性上取得大幅改进,并在测试数据上获得远高于标准NP的性能,接近BP的水平。此外,我们提出的新公式允许应用于噪声过程本身不可访问的噪声系统。