Robust coordination skills enable agents to operate cohesively in shared environments, together towards a common goal and, ideally, individually without hindering each other's progress. To this end, this paper presents Coordinated QMIX (CoMIX), a novel training framework for decentralized agents that enables emergent coordination through flexible policies, allowing at the same time independent decision-making at individual level. CoMIX models selfish and collaborative behavior as incremental steps in each agent's decision process. This allows agents to dynamically adapt their behavior to different situations balancing independence and collaboration. Experiments using a variety of simulation environments demonstrate that CoMIX outperforms baselines on collaborative tasks. The results validate our incremental policy approach as effective technique for improving coordination in multi-agent systems.
翻译:鲁棒的协调能力使智能体能够在共享环境中协同运作,共同朝着同一目标前进,并理想情况下各自行动而不相互干扰。为此,本文提出协调型QMIX(CoMIX),一种针对去中心化智能体的新型训练框架,通过灵活的策略实现涌现式协调,同时允许单个层面的独立决策。CoMIX将自私行为与协作行为建模为每个智能体决策过程中的增量步骤,使智能体能够动态适应不同情境,在独立性与协作性之间取得平衡。在多种仿真环境下的实验表明,CoMIX在协作任务上优于基线方法。结果验证了我们的增量策略方法作为提升多智能体系统协调性的有效技术。