This work extends an existing virtual multi-agent platform called RoboSumo to create TripleSumo -- a platform for investigating multi-agent cooperative behaviors in continuous action spaces, with physical contact in an adversarial environment. In this paper we investigate a scenario in which two agents, namely `Bug' and `Ant', must team up and push another agent `Spider' out of the arena. To tackle this goal, the newly added agent `Bug' is trained during an ongoing match between `Ant' and `Spider'. `Bug' must develop awareness of the other agents' actions, infer the strategy of both sides, and eventually learn an action policy to cooperate. The reinforcement learning algorithm Deep Deterministic Policy Gradient (DDPG) is implemented with a hybrid reward structure combining dense and sparse rewards. The cooperative behavior is quantitatively evaluated by the mean probability of winning the match and mean number of steps needed to win.
翻译:本研究扩展了现有的虚拟多智能体平台RoboSumo,创建了TripleSumo——一个用于在连续动作空间和对抗性环境中研究多智能体合作行为的平台,该环境包含物理接触。本文研究了一个场景:两个智能体(名为“Bug”和“Ant”)必须组队将另一个智能体“Spider”推出竞技场。为实现这一目标,新添加的智能体“Bug”在“Ant”与“Spider”的进行中比赛期间接受训练。“Bug”需发展对其他智能体行为的感知能力,推断双方的策略,并最终学习合作的行为策略。强化学习算法深度确定性策略梯度(DDPG)采用结合密集奖励与稀疏奖励的混合奖励结构实现。合作行为通过比赛获胜的平均概率及获胜所需平均步数进行定量评估。