Transfer learning in Reinforcement Learning (RL) has been widely studied to overcome training issues of Deep-RL, i.e., exploration cost, data availability and convergence time, by introducing a way to enhance training phase with external knowledge. Generally, knowledge is transferred from expert-agents to novices. While this fixes the issue for a novice agent, a good understanding of the task on expert agent is required for such transfer to be effective. As an alternative, in this paper we propose Expert-Free Online Transfer Learning (EF-OnTL), an algorithm that enables expert-free real-time dynamic transfer learning in multi-agent system. No dedicated expert exists, and transfer source agent and knowledge to be transferred are dynamically selected at each transfer step based on agents' performance and uncertainty. To improve uncertainty estimation, we also propose State Action Reward Next-State Random Network Distillation (sars-RND), an extension of RND that estimates uncertainty from RL agent-environment interaction. We demonstrate EF-OnTL effectiveness against a no-transfer scenario and advice-based baselines, with and without expert agents, in three benchmark tasks: Cart-Pole, a grid-based Multi-Team Predator-Prey (mt-pp) and Half Field Offense (HFO). Our results show that EF-OnTL achieve overall comparable performance when compared against advice-based baselines while not requiring any external input nor threshold tuning. EF-OnTL outperforms no-transfer with an improvement related to the complexity of the task addressed.
翻译:强化学习中的迁移学习已被广泛研究,通过引入外部知识增强训练阶段,以解决深度强化学习的训练难题,即探索成本、数据可用性和收敛时间。通常,知识从专家智能体迁移至新手智能体。尽管这解决了新手智能体的训练问题,但要使迁移有效,专家智能体需对任务有充分理解。作为替代方案,本文提出无专家在线迁移学习(EF-OnTL),一种能在多智能体系统中实现无专家实时动态迁移学习的算法。该算法不存在专用专家,迁移源智能体和待迁移知识在每次迁移步骤中根据智能体的性能和不确定性动态选择。为改进不确定性估计,我们还提出状态-动作-奖励-下一状态随机网络蒸馏(sars-RND),这是RND的一种扩展方法,可从强化学习智能体与环境的交互中估计不确定性。我们在三个基准任务中验证EF-OnTL相对于无迁移场景及基于建议的基线(包括有无专家智能体两种情况)的有效性:Cart-Pole、基于网格的多队捕食者-猎物(mt-pp)以及半场进攻(HFO)。结果表明,EF-OnTL在与基于建议的基线相比时能达到整体可比的性能,且无需任何外部输入或阈值调优。EF-OnTL在性能上优于无迁移方法,其改进程度与任务复杂度相关。