Training resource-constrained autonomous agents on multiple tasks simultaneously is crucial for adapting to diverse real-world environments. Recent works employ reinforcement learning (RL) approach, but they still suffer from sub-optimal multi-task performance due to task interference. State-of-the-art works employ Spiking Neural Networks (SNNs) to improve RL-based multi-task learning and enable low-power/energy operations through network enhancements and spike-driven data stream processing. However, they rely on fixed task-switching intervals during its training, thus limiting its performance and scalability. To address this, we propose SwitchMT, a novel methodology that employs adaptive task-switching for effective, scalable, and simultaneous multi-task learning. SwitchMT employs the following key ideas: (1) leveraging a Deep Spiking Q-Network with active dendrites and dueling structure, that utilizes task-specific context signals to create specialized sub-networks; and (2) devising an adaptive task-switching policy that leverages both rewards and internal dynamics of the network parameters. Experimental results demonstrate that SwitchMT achieves competitive scores in multiple Atari games (i.e., Pong: -8.8, Breakout: 5.6, and Enduro: 355.2) and longer game episodes as compared to the state-of-the-art. These results also highlight the effectiveness of SwitchMT methodology in addressing task interference without increasing the network complexity, enabling intelligent autonomous agents with scalable multi-task learning capabilities.
翻译:在资源受限的自主体上同时进行多任务训练对于适应多样化的现实环境至关重要。现有研究多采用强化学习方法,但仍因任务间干扰而存在多任务性能欠优的问题。前沿工作通过脉冲神经网络改进基于强化学习的多任务学习,并借助网络增强与脉冲驱动数据流处理实现低功耗/低能耗运行。然而,这些方法在训练期间依赖固定的任务切换间隔,从而限制了其性能与可扩展性。为此,我们提出SwitchMT——一种采用自适应任务切换策略以实现高效、可扩展、同步多任务学习的新方法。SwitchMT的核心创新包括:(1)构建具有主动树突结构与竞争架构的深度脉冲Q网络,利用任务特定上下文信号创建专用子网络;(2)设计融合奖励信号与网络参数内部动态的自适应任务切换策略。实验结果表明,相较于现有最优方法,SwitchMT在多个Atari游戏(即Pong: -8.8, Breakout: 5.6, Enduro: 355.2)中取得具有竞争力的分数,并能维持更长的游戏回合。这些结果同时证明,SwitchMT方法能在不增加网络复杂度的前提下有效解决任务干扰问题,为智能自主体赋予可扩展的多任务学习能力。