Training resource-constrained autonomous agents on multiple tasks simultaneously is crucial for adapting to diverse real-world environments. Recent works employ reinforcement learning (RL) approach, but they still suffer from sub-optimal multi-task performance due to task interference. State-of-the-art works employ Spiking Neural Networks (SNNs) to improve RL-based multi-task learning and enable low-power/energy operations through network enhancements and spike-driven data stream processing. However, they rely on fixed task-switching intervals during its training, thus limiting its performance and scalability. To address this, we propose SwitchMT, a novel methodology that employs adaptive task-switching for effective, scalable, and simultaneous multi-task learning. SwitchMT employs the following key ideas: (1) leveraging a Deep Spiking Q-Network with active dendrites and dueling structure, that utilizes task-specific context signals to create specialized sub-networks; and (2) devising an adaptive task-switching policy that leverages both rewards and internal dynamics of the network parameters. Experimental results demonstrate that SwitchMT achieves competitive scores in multiple Atari games (i.e., Pong: -8.8, Breakout: 5.6, and Enduro: 355.2) and longer game episodes as compared to the state-of-the-art. These results also highlight the effectiveness of SwitchMT methodology in addressing task interference without increasing the network complexity, enabling intelligent autonomous agents with scalable multi-task learning capabilities.
翻译:训练资源受限的自主代理同时执行多种任务对于适应多样的真实环境至关重要。现有工作采用强化学习(RL)方法,但由于任务干扰,其多任务性能仍次优。前沿工作通过脉冲神经网络(SNN)改进基于强化学习的多任务学习,并借助网络增强和脉冲驱动数据流处理实现低功耗/低能量运行。然而,这些方法在训练过程中依赖固定的任务切换间隔,从而限制了其性能与可扩展性。为解决这一问题,我们提出SwitchMT——一种采用自适应任务切换实现高效、可扩展且同步多任务学习的新方法。SwitchMT包含以下核心思想:(1)利用具有主动树突和决斗结构的深度脉冲Q网络,通过任务特定上下文信号创建专用子网络;(2)设计一种自适应任务切换策略,同时利用奖励和网络参数的内部动态特性。实验结果表明,在多个Atari游戏(如Pong:-8.8、Breakout:5.6、Enduro:355.2)中,SwitchMT取得了有竞争力的分数,并实现了比现有技术更长的游戏回合。这些结果还凸显了SwitchMT方法在不增加网络复杂度的前提下解决任务干扰的有效性,使智能自主代理具备可扩展的多任务学习能力。