Multi-agent reinforcement learning typically suffers from the problem of sample inefficiency, where learning suitable policies involves the use of many data samples. Learning from external demonstrators is a possible solution that mitigates this problem. However, most prior approaches in this area assume the presence of a single demonstrator. Leveraging multiple knowledge sources (i.e., advisors) with expertise in distinct aspects of the environment could substantially speed up learning in complex environments. This paper considers the problem of simultaneously learning from multiple independent advisors in multi-agent reinforcement learning. The approach leverages a two-level Q-learning architecture, and extends this framework from single-agent to multi-agent settings. We provide principled algorithms that incorporate a set of advisors by both evaluating the advisors at each state and subsequently using the advisors to guide action selection. We also provide theoretical convergence and sample complexity guarantees. Experimentally, we validate our approach in three different test-beds and show that our algorithms give better performances than baselines, can effectively integrate the combined expertise of different advisors, and learn to ignore bad advice.
翻译:多智能体强化学习通常面临样本效率低下的问题,即学习合适策略需要大量数据样本。利用外部演示者进行学习是缓解该问题的可行方案。然而,该领域现有研究大多假设存在单一演示者。利用在环境不同方面具备专业知识的多个知识源(即顾问)可以显著加速复杂环境中的学习进程。本文探讨了在多智能体强化学习中同时从多个独立顾问学习的问题。该方法采用双层Q学习架构,并将此框架从单智能体场景扩展至多智能体场景。我们提出了规范化的算法,通过在每个状态评估顾问并据此指导动作选择来整合一组顾问。同时提供了理论收敛性和样本复杂度保证。实验方面,我们在三个不同测试平台上验证了该方法,结果表明我们的算法性能优于基线方法,能有效整合不同顾问的联合专业知识,并学会忽略不良建议。