Neuromorphic computing systems are set to revolutionize energy-constrained robotics by achieving orders-of-magnitude efficiency gains, while enabling native temporal processing. Spiking Neural Networks (SNNs) represent a promising algorithmic approach for these systems, yet their application to complex control tasks faces two critical challenges: (1) the non-differentiable nature of spiking neurons necessitates surrogate gradients with unclear optimization properties, and (2) the stateful dynamics of SNNs require training on sequences, which in reinforcement learning (RL) is hindered by limited sequence lengths during early training, preventing the network from bridging its warm-up period. We address these challenges by systematically analyzing surrogate gradient slope settings, showing that shallower slopes increase gradient magnitude in deeper layers but reduce alignment with true gradients. In supervised learning, we find no clear preference for fixed or scheduled slopes. The effect is much more pronounced in RL settings, where shallower slopes or scheduled slopes lead to a 2.1x improvement in both training and final deployed performance. Next, we propose a novel training approach that leverages a privileged guiding policy to bootstrap the learning process, while still exploiting online environment interactions with the spiking policy. Combining our method with an adaptive slope schedule for a real-world drone position control task, we achieve an average return of 400 points, substantially outperforming prior techniques, including Behavioral Cloning and TD3BC, which achieve at most --200 points under the same conditions. This work advances both the theoretical understanding of surrogate gradient learning in SNNs and practical training methodologies for neuromorphic controllers demonstrated in real-world robotic systems.
翻译:神经形态计算系统有望通过实现数量级的能效提升,同时支持原生时序处理,彻底改变能源受限的机器人技术。脉冲神经网络(SNNs)作为这些系统的算法实现途径前景广阔,但其在复杂控制任务中的应用面临两大关键挑战:(1)脉冲神经元的不可微特性需要依赖优化特性不明确的替代梯度;(2)SNNs的状态动态特性要求基于序列进行训练,而在强化学习(RL)中,早期训练阶段有限的序列长度阻碍了网络跨越其预热期。我们通过系统分析替代梯度斜率设置来应对这些挑战,研究表明较浅的斜率能增加深层网络的梯度幅度,但会降低与真实梯度的对齐度。在监督学习中,固定斜率与调度斜率未表现出明确优劣。该效应在RL场景中更为显著,较浅斜率或调度斜率能使训练及最终部署性能提升2.1倍。进一步,我们提出一种新颖的训练方法,利用特权引导策略来启动学习过程,同时仍通过脉冲策略进行在线环境交互。将本方法与自适应斜率调度相结合,应用于现实世界无人机位置控制任务,我们实现了平均回报400点的成绩,显著超越了包括行为克隆和TD3BC在内的现有技术(相同条件下最高仅达-200点)。本研究不仅推进了对SNNs中替代梯度学习的理论理解,也为神经形态控制器在现实机器人系统中的实际训练方法提供了创新方案。