Multi-agent reinforcement learning typically suffers from the problem of sample inefficiency, where learning suitable policies involves the use of many data samples. Learning from external demonstrators is a possible solution that mitigates this problem. However, most prior approaches in this area assume the presence of a single demonstrator. Leveraging multiple knowledge sources (i.e., advisors) with expertise in distinct aspects of the environment could substantially speed up learning in complex environments. This paper considers the problem of simultaneously learning from multiple independent advisors in multi-agent reinforcement learning. The approach leverages a two-level Q-learning architecture, and extends this framework from single-agent to multi-agent settings. We provide principled algorithms that incorporate a set of advisors by both evaluating the advisors at each state and subsequently using the advisors to guide action selection. We also provide theoretical convergence and sample complexity guarantees. Experimentally, we validate our approach in three different test-beds and show that our algorithms give better performances than baselines, can effectively integrate the combined expertise of different advisors, and learn to ignore bad advice.
翻译:多智能体强化学习通常面临样本效率低下的问题,即学习合适策略需要大量数据样本。利用外部演示者是缓解该问题的一种可行方案。然而,此前的多数相关方法均假设仅存在单一演示者。充分利用在环境不同方面具有专长的多个知识来源(即指导者)可以显著加速复杂环境中的学习。本文研究了多智能体强化学习中同时从多个独立指导者处学习的问题。该方法采用双层Q学习架构,并将其从单智能体框架扩展至多智能体场景。我们提出了具有理论依据的算法,通过在每个状态下评估各指导者,并据此利用指导者引导动作选择,从而整合一组指导者的知识。同时提供了收敛性与样本复杂度的理论保证。实验方面,我们在三个不同测试平台上验证了该方法,结果表明:所提算法性能优于基线方法,能有效融合不同指导者的综合专长,并具备自动忽略不良建议的能力。