代理目标：弥合离散脉冲神经网络与连续控制之间的鸿沟 (Proxy Target: Bridging the Gap Between Discrete Spiking Neural Networks and Continuous Control)

Spiking Neural Networks (SNNs) offer low-latency and energy-efficient decision making on neuromorphic hardware, making them attractive for Reinforcement Learning (RL) in resource-constrained edge devices. However, most RL algorithms for continuous control are designed for Artificial Neural Networks (ANNs), particularly the target network soft update mechanism, which conflicts with the discrete and non-differentiable dynamics of spiking neurons. We show that this mismatch destabilizes SNN training and degrades performance. To bridge the gap between discrete SNNs and continuous-control algorithms, we propose a novel proxy target framework. The proxy network introduces continuous and differentiable dynamics that enable smooth target updates, stabilizing the learning process. Since the proxy operates only during training, the deployed SNN remains fully energy-efficient with no additional inference overhead. Extensive experiments on continuous control benchmarks demonstrate that our framework consistently improves stability and achieves up to $32\%$ higher performance across various spiking neuron models. Notably, to the best of our knowledge, this is the first approach that enables SNNs with simple Leaky Integrate and Fire (LIF) neurons to surpass their ANN counterparts in continuous control. This work highlights the importance of SNN-tailored RL algorithms and paves the way for neuromorphic agents that combine high performance with low power consumption. Code is available at https://github.com/xuzijie32/Proxy-Target.

翻译：脉冲神经网络（SNNs）在神经形态硬件上能够实现低延迟、高能效的决策，这使其在资源受限的边缘设备中进行强化学习（RL）具有吸引力。然而，大多数用于连续控制的RL算法是为人工神经网络（ANNs）设计的，特别是目标网络软更新机制，这与脉冲神经元的离散、不可微动力学特性相冲突。我们证明这种不匹配会破坏SNN训练的稳定性并降低性能。为了弥合离散SNNs与连续控制算法之间的鸿沟，我们提出了一种新颖的代理目标框架。该代理网络引入了连续且可微的动力学，实现了平滑的目标更新，从而稳定了学习过程。由于代理仅在训练期间运行，部署的SNN仍保持完全的高能效，且不增加额外的推理开销。在连续控制基准测试上进行的大量实验表明，我们的框架持续提升了稳定性，并在多种脉冲神经元模型上实现了高达$32\%$的性能提升。值得注意的是，据我们所知，这是首次使采用简单漏积分发放（LIF）神经元的SNNs在连续控制任务中超越其ANN对应模型的方法。这项工作强调了为SNN量身定制RL算法的重要性，并为结合高性能与低功耗的神经形态智能体铺平了道路。代码可在 https://github.com/xuzijie32/Proxy-Target 获取。