Generating competitive strategies and performing continuous motion planning simultaneously in an adversarial setting is a challenging problem. In addition, understanding the intent of other agents is crucial to deploying autonomous systems in adversarial multi-agent environments. Existing approaches either discretize agent action by grouping similar control inputs, sacrificing performance in motion planning, or plan in uninterpretable latent spaces, producing hard-to-understand agent behaviors. Furthermore, the most popular policy optimization frameworks do not recognize the long-term effect of actions and become myopic. This paper proposes an agent action discretization method via abstraction that provides clear intentions of agent actions, an efficient offline pipeline of agent population synthesis, and a planning strategy using counterfactual regret minimization with function approximation. Finally, we experimentally validate our findings on scaled autonomous vehicles in a head-to-head racing setting. We demonstrate that using the proposed framework significantly improves learning, improves the win rate against different opponents, and the improvements can be transferred to unseen opponents in an unseen environment.
翻译:在对抗环境中同时生成竞争策略并执行连续运动规划是一个具有挑战性的问题。此外,理解其他智能体的意图对于在多智能体对抗环境中部署自主系统至关重要。现有方法要么通过将类似控制输入分组来离散化智能体动作,从而牺牲运动规划性能,要么在不可解释的潜空间中规划,产生难以理解的智能体行为。更关键的是,最主流的策略优化框架未能认识到动作的长期影响,导致视野短浅。本文提出一种基于抽象化的智能体动作离散化方法,可提供清晰的智能体动作意图;一种高效的智能体群体合成离线流程;以及一种利用函数逼近的反事实遗憾最小化规划策略。最后,我们在头对头竞速场景中对缩比自主车辆进行实验验证。结果表明,使用所提框架能显著提升学习效果,提高对不同对手的胜率,且改进可迁移至未见环境中的未见对手。