Spiking Neural Networks (SNNs) can achieve competitive performance by converting already existing well-trained Artificial Neural Networks (ANNs), avoiding further costly training. This property is particularly attractive in Reinforcement Learning (RL), where training through environment interaction is expensive and potentially unsafe. However, existing conversion methods perform poorly in continuous control, where suitable baselines are largely absent. We identify error amplification as the key cause: small action approximation errors become temporally correlated across decision steps, inducing cumulative state distribution shift and severe performance degradation. To address this issue, we propose Cross-Step Residual Potential Initialization (CRPI), a lightweight training-free mechanism that carries over residual membrane potentials across decision steps to suppress temporally correlated errors. Experiments on continuous control benchmarks with both vector and visual observations demonstrate that CRPI can be integrated into existing conversion pipelines and substantially recovers lost performance. Our results highlight continuous control as a critical and challenging benchmark for ANN-to-SNN conversion, where small errors can be strongly amplified and impact performance.
翻译:脉冲神经网络(SNNs)可通过转换已训练成熟的人工神经网络(ANNs)实现具有竞争力的性能,从而避免进一步的高成本训练。这一特性在强化学习(RL)中尤为吸引人,因为通过环境交互进行训练不仅成本高昂,且存在潜在安全风险。然而,现有转换方法在连续控制任务中表现不佳,且该领域目前基本缺乏合适的性能基准。我们发现误差放大是导致此问题的关键原因:微小的动作近似误差会在决策步骤间产生时间相关性,进而引发累积性状态分布偏移与严重的性能衰退。为解决该问题,我们提出跨步残差膜电位初始化(CRPI)——一种轻量级免训练机制,通过在决策步骤间传递残差膜电位来抑制时间相关误差。在包含向量观测与视觉观测的连续控制基准测试中,实验表明CRPI能够融入现有转换流程,并显著恢复损失的模型性能。我们的研究结果凸显了连续控制作为ANN-to-SNN转换关键挑战基准的重要性,其中微小误差可能被急剧放大并严重影响最终性能。