Cooperative multi-agent reinforcement learning is a powerful tool to solve many real-world cooperative tasks, but restrictions of real-world applications may require training the agents in a fully decentralized manner. Due to the lack of information about other agents, it is challenging to derive algorithms that can converge to the optimal joint policy in a fully decentralized setting. Thus, this research area has not been thoroughly studied. In this paper, we seek to systematically review the fully decentralized methods in two settings: maximizing a shared reward of all agents and maximizing the sum of individual rewards of all agents, and discuss open questions and future research directions.
翻译:协作多智能体强化学习是解决许多真实世界协作任务的强大工具,但实际应用的限制可能要求以完全去中心化的方式训练智能体。由于缺乏其他智能体的信息,在完全去中心化设定下设计能够收敛到最优联合策略的算法颇具挑战,因此该研究领域尚未得到充分探索。本文系统综述了两种设定下的完全去中心化方法:最大化所有智能体的共享奖励以及最大化所有智能体个体奖励之和,并探讨了开放性问题及未来研究方向。