A Principle of Targeted Intervention for Multi-Agent Reinforcement Learning

Steering cooperative multi-agent reinforcement learning (MARL) towards desired outcomes is challenging, particularly when the global guidance from a human on the whole multi-agent system is impractical in a large-scale MARL. On the other hand, designing external mechanisms (e.g., intrinsic rewards and human feedback) to coordinate agents mostly relies on empirical studies, lacking a easy-to-use research tool. In this work, we employ multi-agent influence diagrams (MAIDs) as a graphical framework to address the above issues. First, we introduce the concept of MARL interaction paradigms (orthogonal to MARL learning paradigms), using MAIDs to analyze and visualize both unguided self-organization and global guidance mechanisms in MARL. Then, we design a new MARL interaction paradigm, referred to as the targeted intervention paradigm that is applied to only a single targeted agent, so the problem of global guidance can be mitigated. In implementation, we introduce a causal inference technique, referred to as Pre-Strategy Intervention (PSI), to realize the targeted intervention paradigm. Since MAIDs can be regarded as a special class of causal diagrams, a composite desired outcome that integrates the primary task goal and an additional desired outcome can be achieved by maximizing the corresponding causal effect through the PSI. Moreover, the bundled relevance graph analysis of MAIDs provides a tool to identify whether an MARL learning paradigm is workable under the design of an MARL interaction paradigm. In experiments, we demonstrate the effectiveness of our proposed targeted intervention, and verify the result of relevance graph analysis.

翻译：在协作式多智能体强化学习（MARL）中，引导系统朝向期望结果具有挑战性，尤其当大规模MARL中人类对整个多智能体系统进行全局指导不切实际时。另一方面，设计外部机制（如内在奖励和人类反馈）来协调智能体多依赖于实证研究，缺乏易于使用的研究工具。本研究采用多智能体影响图（MAIDs）作为图形化框架来解决上述问题。首先，我们引入MARL交互范式（与MARL学习范式正交）的概念，利用MAIDs分析和可视化MARL中无引导自组织与全局指导机制。随后，我们设计了一种新的MARL交互范式，称为定向干预范式，该范式仅应用于单个目标智能体，从而缓解全局指导问题。在实现层面，我们引入一种因果推理技术——预策略干预（PSI），以实现定向干预范式。由于MAIDs可视为特殊类别的因果图，通过PSI最大化相应因果效应，可实现整合主要任务目标与附加期望结果的复合期望结果。此外，MAIDs的捆绑关联图分析提供了工具，用于判断特定MARL交互范式设计下MARL学习范式是否可行。实验中，我们验证了所提定向干预的有效性，并证实了关联图分析的结果。