Steering cooperative multi-agent reinforcement learning (MARL) towards desired outcomes is challenging, particularly when the global guidance from a human on the whole multi-agent system is impractical in a large-scale MARL. On the other hand, designing mechanisms to coordinate agents most relies on empirical studies, lacking a easy-to-use research tool. In this work, we employ multi-agent influence diagrams (MAIDs) as a graphical framework to address the above issues. First, we introduce interaction paradigms that leverage MAIDs to analyze and visualize existing approaches in MARL. Then, we design a new interaction paradigm based on MAIDs, referred to as targeted intervention that is applied to only a single targeted agent, so the problem of global guidance can be mitigated. In our implementation, we introduce a causal inference technique-referred to as Pre-Strategy Intervention (PSI)-to realize the targeted intervention paradigm. Since MAIDs can be regarded as a special class of causal diagrams, a composite desired outcome that integrates the primary task goal and an additional desired outcome can be achieved by maximizing the corresponding causal effect through the PSI. Moreover, the bundled relevance graph analysis of MAIDs provides a tool to identify whether an MARL learning paradigm is workable under the design of an interaction paradigm. In experiments, we demonstrate the effectiveness of our proposed targeted intervention, and verify the result of relevance graph analysis.
翻译:在协同多智能体强化学习(MARL)中引导系统朝向期望结果具有挑战性,尤其是在大规模MARL中难以获得人类对整个多智能体系统的全局指导。另一方面,现有协调机制的设计多依赖于实证研究,缺乏易用的研究工具。本研究采用多智能体影响图(MAID)作为图形化框架来解决上述问题。首先,我们引入基于MAID的交互范式来分析和可视化MARL中的现有方法。随后,我们设计了一种基于MAID的新型交互范式——目标干预,该范式仅作用于单个目标智能体,从而缓解全局指导的难题。在实现层面,我们引入称为预策略干预(PSI)的因果推断技术来实现目标干预范式。由于MAID可视为因果图的特殊类别,通过PSI最大化相应因果效应即可实现融合主任务目标与附加期望结果的复合期望结果。此外,MAID的捆绑关联图分析为判断特定交互范式设计下MARL学习范式的可行性提供了工具。实验部分验证了所提目标干预方法的有效性,并证实了关联图分析的结果。