In many multi-agent and high-dimensional robotic tasks, the controller can be designed in either a centralized or decentralized way. Correspondingly, it is possible to use either single-agent reinforcement learning (SARL) or multi-agent reinforcement learning (MARL) methods to learn such controllers. However, the relationship between these two paradigms remains under-studied in the literature. This work explores research questions in terms of robustness and performance of SARL and MARL approaches to the same task, in order to gain insight into the most suitable methods. We start by analytically showing the equivalence between these two paradigms under the full-state observation assumption. Then, we identify a broad subclass of \textit{Dec-POMDP} tasks where the agents are weakly or partially interacting. In these tasks, we show that partial observations of each agent are sufficient for near-optimal decision-making. Furthermore, we propose to exploit such partially observable MARL to improve the robustness of robots when joint or agent failures occur. Our experiments on both simulated multi-agent tasks and a real robot task with a mobile manipulator validate the presented insights and the effectiveness of the proposed robust robot learning method via partially observable MARL.
翻译:在多智能体和高维机器人任务中,控制器可设计为集中式或分布式两种范式。相应地,既可以使用单智能体强化学习(SARL)方法,也可以使用多智能体强化学习(MARL)方法学习此类控制器。然而,这两种范式之间的关系在现有文献中仍未得到充分研究。本文旨在探索针对同一任务时SARL与MARL方法在鲁棒性和性能方面的研究问题,以揭示更优方法的适用场景。我们首先通过理论分析证明了在全状态观测假设下两种范式的等价性,继而识别出一类特殊的\textit{Dec-POMDP}任务子类——其中智能体间仅存在弱交互或部分交互。研究表明,在此类任务中,各智能体的部分观测信息足以支撑近最优决策。进一步地,我们提出利用这种部分可观测MARL框架,在关节或智能体故障发生时提升机器人的鲁棒性。本文在仿真多智能体任务与真实移动机械臂机器人任务上的实验,验证了所提出的见解以及基于部分可观测MARL的鲁棒机器人学习方法的有效性。