With the help of special neuromorphic hardware, spiking neural networks (SNNs) are expected to realize artificial intelligence (AI) with less energy consumption. It provides a promising energy-efficient way for realistic control tasks by combining SNNs with deep reinforcement learning (DRL). In this paper, we focus on the task where the agent needs to learn multi-dimensional deterministic policies to control, which is very common in real scenarios. Recently, the surrogate gradient method has been utilized for training multi-layer SNNs, which allows SNNs to achieve comparable performance with the corresponding deep networks in this task. Most existing spike-based RL methods take the firing rate as the output of SNNs, and convert it to represent continuous action space (i.e., the deterministic policy) through a fully-connected (FC) layer. However, the decimal characteristic of the firing rate brings the floating-point matrix operations to the FC layer, making the whole SNN unable to deploy on the neuromorphic hardware directly. To develop a fully spiking actor network without any floating-point matrix operations, we draw inspiration from the non-spiking interneurons found in insects and employ the membrane voltage of the non-spiking neurons to represent the action. Before the non-spiking neurons, multiple population neurons are introduced to decode different dimensions of actions. Since each population is used to decode a dimension of action, we argue that the neurons in each population should be connected in time domain and space domain. Hence, the intra-layer connections are used in output populations to enhance the representation capacity. Finally, we propose a fully spiking actor network with intra-layer connections (ILC-SAN).
翻译:受神经形态硬件的启发,脉冲神经网络(SNNs)有望以更低的能耗实现人工智能(AI)。通过将SNNs与深度强化学习(DRL)相结合,可为实际控制任务提供一种有前景的节能方案。本文关注智能体需要学习多维确定性策略进行控制的任务,这在真实场景中十分常见。近期,代理梯度方法被用于训练多层SNNs,使SNNs在此类任务中能够达到与对应深度网络相媲美的性能。现有的大多数基于脉冲的强化学习方法将脉冲发放率作为SNNs的输出,并通过全连接(FC)层将其转换为连续动作空间(即确定性策略)的表示。然而,脉冲发放率的小数特性使得FC层必须进行浮点矩阵运算,导致整个SNNs无法直接部署在神经形态硬件上。为构建无需任何浮点矩阵运算的全脉冲动作网络,我们受昆虫中非脉冲中间神经元的启发,采用非脉冲神经元的膜电压来表示动作。在非脉冲神经元之前,引入多个群体神经元以解码动作的不同维度。由于每个群体用于解码一个维度的动作,我们认为每个群体内的神经元应在时间域和空间域上建立连接。因此,我们在输出群体中使用内连接以增强表示能力。最终,我们提出一种带有内连接的全脉冲动作网络(ILC-SAN)。