WQMIX, QMIX, QTRAN, and VDN are SOTA algorithms for Dec-POMDP. All of them cannot solve complex agents' cooperation domains. We give an algorithm to solve such problems. In the first stage, we solve a single-agent problem and get a policy. In the second stage, we solve the multi-agent problem with the single-agent policy. SA2MA has a clear advantage over all competitors in complex agents' cooperative domains.
翻译:WQMIX、QMIX、QTRAN和VDN是解决Dec-POMDP问题的先进算法,但这些算法均无法有效处理复杂智能体协作领域。本文提出一种解决此类问题的新算法。在第一阶段,我们求解单智能体问题并获得策略;在第二阶段,利用该单智能体策略求解多智能体问题。实验表明,SA2MA算法在复杂智能体协作领域中较现有方法具有显著优势。