We present a multi-agent reinforcement learning approach to solve a pursuit-evasion game between two players with car-like dynamics and sensing limitations. We develop a curriculum for an existing multi-agent deterministic policy gradient algorithm to simultaneously obtain strategies for both players, and deploy the learned strategies on real robots moving as fast as 2 m/s in indoor environments. Through experiments we show that the learned strategies improve over existing baselines by up to 30% in terms of capture rate for the pursuer. The learned evader model has up to 5% better escape rate over the baselines even against our competitive pursuer model. We also present experiment results which show how the pursuit-evasion game and its results evolve as the player dynamics and sensor constraints are varied. Finally, we deploy learned policies on physical robots for a game between the F1TENTH and JetRacer platforms and show that the learned strategies can be executed on real-robots. Our code and supplementary material including videos from experiments are available at https: //gonultasbu.github.io/pursuit-evasion/.
翻译:我们提出一种多智能体强化学习方法,用于解决两个具有类车动力学与感知限制的玩家之间的追逃博弈。针对现有的一种多智能体确定性策略梯度算法,我们开发了一套课程学习框架,以同时获取双方玩家的策略,并将所学策略部署于室内环境中以高达2米/秒速度移动的真实机器人上。实验表明,就追捕者的捕获率而言,所学策略相较现有基线方法提升最高达30%。即便面对我们的对抗性追捕模型,所学逃跑模型的逃脱率仍比基线方法高5%。我们还展示了追逃博弈及其结果随玩家动力学与传感器约束变化的演化过程实验。最终,我们将所学策略部署于F1TENTH与JetRacer平台上的物理机器人进行博弈,验证了策略在真实机器人上的可执行性。我们的代码及包含实验视频的补充材料可通过https://gonultasbu.github.io/pursuit-evasion/获取。