Transfer learning in Reinforcement Learning (RL) has been widely studied to overcome training issues of Deep-RL, i.e., exploration cost, data availability and convergence time, by introducing a way to enhance training phase with external knowledge. Generally, knowledge is transferred from expert-agents to novices. While this fixes the issue for a novice agent, a good understanding of the task on expert agent is required for such transfer to be effective. As an alternative, in this paper we propose Expert-Free Online Transfer Learning (EF-OnTL), an algorithm that enables expert-free real-time dynamic transfer learning in multi-agent system. No dedicated expert exists, and transfer source agent and knowledge to be transferred are dynamically selected at each transfer step based on agents' performance and uncertainty. To improve uncertainty estimation, we also propose State Action Reward Next-State Random Network Distillation (sars-RND), an extension of RND that estimates uncertainty from RL agent-environment interaction. We demonstrate EF-OnTL effectiveness against a no-transfer scenario and advice-based baselines, with and without expert agents, in three benchmark tasks: Cart-Pole, a grid-based Multi-Team Predator-Prey (mt-pp) and Half Field Offense (HFO). Our results show that EF-OnTL achieve overall comparable performance when compared against advice-based baselines while not requiring any external input nor threshold tuning. EF-OnTL outperforms no-transfer with an improvement related to the complexity of the task addressed.
翻译:强化学习中的迁移学习已被广泛研究,旨在通过引入外部知识增强训练阶段,克服深度强化学习的训练难题(如探索成本、数据可用性和收敛时间)。通常,知识从专家智能体迁移至新手智能体。尽管这解决了新手智能体的问题,但为确保迁移有效性,专家智能体需对任务有深入理解。作为替代方案,本文提出无专家在线迁移学习(EF-OnTL),一种能够在多智能体系统中实现无专家实时动态迁移学习的算法。该算法不依赖专用专家,迁移源智能体及待迁移知识根据智能体的性能与不确定性,在每个迁移步骤中动态选择。为改善不确定性估计,我们还提出状态-动作-奖励-下一状态随机网络蒸馏(sars-RND),一种通过强化学习智能体-环境交互估计不确定性的RND扩展方法。我们在三个基准任务(Cart-Pole、基于网格的多团队捕食者-猎物(mt-pp)和半场进攻(HFO))中,对比无迁移场景和基于建议的基线方法(分别在有/无专家智能体条件下),验证了EF-OnTL的有效性。结果表明:EF-OnTL在无需任何外部输入和阈值调节的情况下,整体性能与基于建议的基线方法相当;且相较于无迁移场景,EF-OnTL的性能提升与任务复杂度正相关。