Fully Spiking Actor Network with Intra-layer Connections for Reinforcement Learning

With the help of special neuromorphic hardware, spiking neural networks (SNNs) are expected to realize artificial intelligence (AI) with less energy consumption. It provides a promising energy-efficient way for realistic control tasks by combining SNNs with deep reinforcement learning (DRL). In this paper, we focus on the task where the agent needs to learn multi-dimensional deterministic policies to control, which is very common in real scenarios. Recently, the surrogate gradient method has been utilized for training multi-layer SNNs, which allows SNNs to achieve comparable performance with the corresponding deep networks in this task. Most existing spike-based RL methods take the firing rate as the output of SNNs, and convert it to represent continuous action space (i.e., the deterministic policy) through a fully-connected (FC) layer. However, the decimal characteristic of the firing rate brings the floating-point matrix operations to the FC layer, making the whole SNN unable to deploy on the neuromorphic hardware directly. To develop a fully spiking actor network without any floating-point matrix operations, we draw inspiration from the non-spiking interneurons found in insects and employ the membrane voltage of the non-spiking neurons to represent the action. Before the non-spiking neurons, multiple population neurons are introduced to decode different dimensions of actions. Since each population is used to decode a dimension of action, we argue that the neurons in each population should be connected in time domain and space domain. Hence, the intra-layer connections are used in output populations to enhance the representation capacity. Finally, we propose a fully spiking actor network with intra-layer connections (ILC-SAN).

翻译：受神经形态硬件的启发，脉冲神经网络（SNNs）有望以更低的能耗实现人工智能（AI）。通过将SNNs与深度强化学习（DRL）相结合，可为实际控制任务提供一种有前景的节能方案。本文关注智能体需要学习多维确定性策略进行控制的任务，这在真实场景中十分常见。近期，代理梯度方法被用于训练多层SNNs，使SNNs在此类任务中能够达到与对应深度网络相媲美的性能。现有的大多数基于脉冲的强化学习方法将脉冲发放率作为SNNs的输出，并通过全连接（FC）层将其转换为连续动作空间（即确定性策略）的表示。然而，脉冲发放率的小数特性使得FC层必须进行浮点矩阵运算，导致整个SNNs无法直接部署在神经形态硬件上。为构建无需任何浮点矩阵运算的全脉冲动作网络，我们受昆虫中非脉冲中间神经元的启发，采用非脉冲神经元的膜电压来表示动作。在非脉冲神经元之前，引入多个群体神经元以解码动作的不同维度。由于每个群体用于解码一个维度的动作，我们认为每个群体内的神经元应在时间域和空间域上建立连接。因此，我们在输出群体中使用内连接以增强表示能力。最终，我们提出一种带有内连接的全脉冲动作网络（ILC-SAN）。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日