While backpropagation (BP) is the mainstream approach for gradient computation in neural network training, its heavy reliance on the chain rule of differentiation constrains the designing flexibility of network architecture and training pipelines. We avoid the recursive computation in BP and develop a unified likelihood ratio (ULR) method for gradient estimation with just one forward propagation. Not only can ULR be extended to train a wide variety of neural network architectures, but the computation flow in BP can also be rearranged by ULR for better device adaptation. Moreover, we propose several variance reduction techniques to further accelerate the training process. Our experiments offer numerical results across diverse aspects, including various neural network training scenarios, computation flow rearrangement, and fine-tuning of pre-trained models. All findings demonstrate that ULR effectively enhances the flexibility of neural network training by permitting localized module training without compromising the global objective and significantly boosts the network robustness.
翻译:反向传播(BP)是神经网络训练中梯度计算的主流方法,但其对微分链式法则的严重依赖限制了网络架构与训练流程的设计灵活性。我们提出一种统一似然比(ULR)方法,通过仅需一次前向传播即可完成梯度估计,从而避免了BP中的递归计算。ULR不仅可扩展用于训练多种神经网络架构,还能对BP中的计算流进行重排以适应不同硬件设备。此外,我们提出了多种方差缩减技术以进一步加速训练过程。实验从神经网络训练场景、计算流重排、预训练模型微调等多个维度提供了量化结果。所有实验表明,ULR通过支持在不牺牲全局目标的前提下进行局部模块训练,有效增强了神经网络训练的灵活性,并显著提升了网络鲁棒性。