Spiking neural networks (SNNs) are widely applied in various fields due to their energy-efficient and fast-inference capabilities. Applying SNNs to reinforcement learning (RL) can significantly reduce the computational resource requirements for agents and improve the algorithm's performance under resource-constrained conditions. However, in current spiking reinforcement learning (SRL) algorithms, the simulation results of multiple time steps can only correspond to a single-step decision in RL. This is quite different from the real temporal dynamics in the brain and also fails to fully exploit the capacity of SNNs to process temporal data. In order to address this temporal mismatch issue and further take advantage of the inherent temporal dynamics of spiking neurons, we propose a novel temporal alignment paradigm (TAP) that leverages the single-step update of spiking neurons to accumulate historical state information in RL and introduces gated units to enhance the memory capacity of spiking neurons. Experimental results show that our method can solve partially observable Markov decision processes (POMDPs) and multi-agent cooperation problems with similar performance as recurrent neural networks (RNNs) but with about 50% power consumption.
翻译:脉冲神经网络(SNNs)因其低能耗和快速推理能力被广泛应用于多个领域。将SNNs应用于强化学习(RL)可显著降低智能体对计算资源的需求,并在资源受限条件下提升算法性能。然而,当前脉冲强化学习(SRL)算法中,多个时间步的模拟结果仅能对应RL中的单步决策。这一机制与大脑中真实的时态动态特性存在显著差异,且未能充分利用SNNs处理时序数据的能力。为解决这一时序错配问题并充分发挥脉冲神经元内在的时态动态特性,我们提出了一种新型时序对齐范式(TAP),该范式利用脉冲神经元的单步更新来积累RL中的历史状态信息,并引入门控单元增强脉冲神经元的记忆容量。实验结果表明,本方法能在解决部分可观测马尔可夫决策过程(POMDPs)和多智能体协作问题时,取得与循环神经网络(RNNs)相近的性能,同时功耗降低约50%。