At present, implementation of learning mechanisms in spiking neural networks (SNN) cannot be considered as a solved scientific problem despite plenty of SNN learning algorithms proposed. It is also true for SNN implementation of reinforcement learning (RL), while RL is especially important for SNNs because of its close relationship to the domains most promising from the viewpoint of SNN application such as robotics. In the present paper, I describe an SNN structure which, seemingly, can be used in wide range of RL tasks. The distinctive feature of my approach is usage of only the spike forms of all signals involved - sensory input streams, output signals sent to actuators and reward/punishment signals. Besides that, selecting the neuron/plasticity models, I was guided by the requirement that they should be easily implemented on modern neurochips. The SNN structure considered in the paper includes spiking neurons described by a generalization of the LIFAT (leaky integrate-and-fire neuron with adaptive threshold) model and a simple spike timing dependent synaptic plasticity model (a generalization of dopamine-modulated plasticity). My concept is based on very general assumptions about RL task characteristics and has no visible limitations on its applicability. To test it, I selected a simple but non-trivial task of training the network to keep a chaotically moving light spot in the view field of an emulated DVS camera. Successful solution of this RL problem by the SNN described can be considered as evidence in favor of efficiency of my approach.
翻译:目前,尽管已提出大量脉冲神经网络(SNN)学习算法,但SNN中学习机制的实现仍不能被视为已解决的科学问题。强化学习(RL)在SNN中的实现同样如此,而RL对SNN尤为重要,因为它与从SNN应用角度最具前景的领域(如机器人技术)密切相关。本文描述了一种看似能适用于广泛RL任务的SNN结构。该方法的特点在于仅使用所有相关信号的脉冲形式——包括感觉输入流、发送至执行器的输出信号以及奖励/惩罚信号。此外,在选择神经元/可塑性模型时,我依据的标准是它们应易于在现代神经芯片上实现。论文中考虑的SNN结构包括由LIFAT(自适应阈值漏积分发放神经元)模型推广描述的脉冲神经元,以及一种简单的脉冲时序依赖性突触可塑性模型(多巴胺调制可塑性的推广)。我的概念基于对RL任务特征的非常一般的假设,且其适用性无明显限制。为测试该结构,我选择了一个简单但非平凡的任务:训练网络将混沌运动的光点保持在模拟DVS相机的视野中。所描述的SNN成功解决该RL问题,可视为支持我方法有效性的证据。