Noise-based reward-modulated learning

The pursuit of energy-efficient and adaptive artificial intelligence (AI) has positioned neuromorphic computing as a promising alternative to conventional computing. However, achieving learning on these platforms requires techniques that prioritize local information while enabling effective credit assignment. Here, we propose noise-based reward-modulated learning (NRL), a novel synaptic plasticity rule that mathematically unifies reinforcement learning and gradient-based optimization with biologically-inspired local updates. NRL addresses the computational bottleneck of exact gradients by approximating them through stochastic neural activity, transforming the inherent noise of biological and neuromorphic substrates into a functional resource. Drawing inspiration from biological learning, our method uses reward prediction errors as its optimization target to generate increasingly advantageous behavior, and eligibility traces to facilitate retrospective credit assignment. Experimental validation on reinforcement tasks, featuring immediate and delayed rewards, shows that NRL achieves performance comparable to baselines optimized using backpropagation, although with slower convergence, while showing significantly superior performance and scalability in multi-layer networks compared to reward-modulated Hebbian learning (RMHL), the most prominent similar approach. While tested on simple architectures, the results highlight the potential of noise-driven, brain-inspired learning for low-power adaptive systems, particularly in computing substrates with locality constraints. NRL offers a theoretically grounded paradigm well-suited for the event-driven characteristics of next-generation neuromorphic AI.

翻译：追求节能和自适应的人工智能（AI）已将神经形态计算定位为传统计算的有前景的替代方案。然而，在这些平台上实现学习需要优先利用局部信息同时实现有效信用分配的技术。在此，我们提出了基于噪声的奖励调制学习（NRL），这是一种新颖的突触可塑性规则，它在数学上将强化学习与基于梯度的优化同受生物启发的局部更新统一起来。NRL通过随机神经活动来近似精确梯度，从而解决了精确梯度的计算瓶颈，并将生物和神经形态基底固有的噪声转化为一种功能性资源。受生物学习的启发，我们的方法使用奖励预测误差作为其优化目标以产生日益有利的行为，并使用资格迹来促进回顾性信用分配。在具有即时和延迟奖励的强化任务上的实验验证表明，NRL实现了与使用反向传播优化的基线相当的性能，尽管收敛速度较慢，同时与最突出的类似方法——奖励调制赫布学习（RMHL）相比，在多层网络中显示出显著优越的性能和可扩展性。虽然在简单架构上进行了测试，但结果突显了噪声驱动的、受大脑启发的学习在低功耗自适应系统，特别是在具有局部性约束的计算基底中的潜力。NRL提供了一个理论基础扎实的范式，非常适合下一代神经形态AI的事件驱动特性。