Currently, state-of-the-art RL methods excel in single-task settings, but they still struggle to generalize across multiple tasks due to catastrophic forgetting challenges, where previously learned tasks are forgotten as new tasks are introduced. This multi-task learning capability is significantly important for generalist agents, where adaptation features are highly required (e.g., autonomous robots). On the other hand, Spiking Neural Networks (SNNs) have emerged as alternative energy-efficient neural network algorithms due to their sparse spike-based operations. Toward this, we propose MTSpark, a novel methodology to enable multi-task RL using spiking networks. Specifically, MTSpark develops a Deep Spiking Q-Network (DSQN) with active dendrites and dueling structure by leveraging task-specific context signals. Specifically, each neuron computes task-dependent activations that dynamically modulate inputs, forming specialized sub-networks for each task. Moreover, this bioplausible network model also benefits from SNNs, enhancing energy efficiency and making the model suitable for hardware implementation. Experimental results show that, our MTSpark effectively learns multiple tasks with higher performance compared to the state-of-the-art. Specifically, MTSpark successfully achieves high score in three Atari games (i.e., Pong: -5.4, Breakout: 0.6, and Enduro: 371.2), reaching human-level performance (i.e., Pong: -3, Breakout: 31, and Enduro: 368), where state-of-the-art struggle to achieve. In addition, our MTSpark also shows better accuracy in image classification tasks than the state-of-the-art. These results highlight the potential of our MTSpark methodology to develop generalist agents that can learn multiple tasks by leveraging both RL and SNN concepts.
翻译:目前,最先进的强化学习方法在单任务场景中表现出色,但由于灾难性遗忘问题——即引入新任务时先前学习的任务会被遗忘——它们仍难以在多个任务间实现泛化。这种多任务学习能力对于需要高度适应性的通用智能体(例如自主机器人)至关重要。另一方面,脉冲神经网络凭借其基于稀疏脉冲运算的特性,已成为一种高能效的替代性神经网络算法。为此,我们提出MTSpark,一种利用脉冲网络实现多任务强化学习的新方法。具体而言,MTSpark通过利用任务特定的上下文信号,构建了具有活跃树突结构和竞争架构的深度脉冲Q网络。该网络中每个神经元计算任务依赖的激活值,动态调制输入信号,从而为每个任务形成专用子网络。此外,这种生物合理的网络模型还受益于脉冲神经网络的特性,提升了能源效率,使其更适合硬件部署。实验结果表明,与现有最优方法相比,我们的MTSpark能够以更高性能有效学习多个任务。具体而言,MTSpark在三款Atari游戏(即Pong: -5.4、Breakout: 0.6和Enduro: 371.2)中成功取得高分,达到人类水平表现(即Pong: -3、Breakout: 31和Enduro: 368),而现有最优方法难以实现这一目标。此外,我们的MTSpark在图像分类任务中也展现出优于现有最优方法的准确率。这些结果凸显了MTSpark方法通过融合强化学习与脉冲神经网络概念,开发能够学习多任务通用智能体的潜力。