This work presents MAC-Flow, a simple yet expressive framework for multi-agent coordination. We argue that requirements of effective coordination are twofold: (i) a rich representation of the diverse joint behaviors present in offline data and (ii) the ability to act efficiently in real time. However, prior approaches often sacrifice one for the other, i.e., denoising diffusion-based solutions capture complex coordination but are computationally slow, while Gaussian policy-based solutions are fast but brittle in handling multi-agent interaction. MAC-Flow addresses this trade-off by first learning a flow-based representation of joint behaviors, and then distilling it into decentralized one-step policies that preserve coordination while enabling fast execution. Across four different benchmarks, including $12$ environments and $34$ datasets, MAC-Flow alleviates the trade-off between performance and computational cost, specifically achieving about $\boldsymbol{\times14.5}$ faster inference compared to diffusion-based MARL methods, while maintaining good performance. At the same time, its inference speed is similar to that of prior Gaussian policy-based offline multi-agent reinforcement learning (MARL) methods.
翻译:本研究提出MAC-Flow——一个简洁而富有表现力的多智能体协同框架。我们认为有效协同需满足双重需求:(i) 对离线数据中多样化联合行为进行丰富表征;(ii) 具备实时高效决策能力。然而现有方法往往顾此失彼:基于去噪扩散的解决方案虽能捕捉复杂协同模式但计算缓慢,而基于高斯策略的方法虽响应迅速却在处理多智能体交互时表现脆弱。MAC-Flow通过双重设计解决这一权衡问题:首先学习基于流的联合行为表征,继而将其蒸馏为分散式单步策略,在保持协同能力的同时实现快速执行。在涵盖12种环境和34个数据集的四项基准测试中,MAC-Flow有效缓解了性能与计算成本间的矛盾,具体实现较基于扩散的MARL方法约$\boldsymbol{\times14.5}$的推理加速,同时保持良好性能。其推理速度与现有基于高斯策略的离线多智能体强化学习(MARL)方法相当。