The objective of this work is to evaluate multi-agent artificial intelligence methods when deployed on teams of unmanned surface vehicles (USV) in an adversarial environment. Autonomous agents were evaluated in real-world scenarios using the Aquaticus test-bed, which is a Capture-the-Flag (CTF) style competition involving teams of USV systems. Cooperative teaming algorithms of various foundations in behavior-based optimization and deep reinforcement learning (RL) were deployed on these USV systems in two versus two teams and tested against each other during a competition period in the fall of 2023. Deep reinforcement learning applied to USV agents was achieved via the Pyquaticus test bed, a lightweight gymnasium environment that allows simulated CTF training in a low-level environment. The results of the experiment demonstrate that rule-based cooperation for behavior-based agents outperformed those trained in Deep-reinforcement learning paradigms as implemented in these competitions. Further integration of the Pyquaticus gymnasium environment for RL with MOOS-IvP in terms of configuration and control schema will allow for more competitive CTF games in future studies. As the development of experimental deep RL methods continues, the authors expect that the competitive gap between behavior-based autonomy and deep RL will be reduced. As such, this report outlines the overall competition, methods, and results with an emphasis on future works such as reward shaping and sim-to-real methodologies and extending rule-based cooperation among agents to react to safety and security events in accordance with human experts intent/rules for executing safety and security processes.
翻译:本工作的目标是在对抗环境中评估多智能体人工智能方法,这些方法部署在由无人水面艇(USV)组成的团队上。自主智能体在真实场景中通过Aquaticus测试平台进行评估,该平台采用夺旗(CTF)竞赛形式,涉及USV系统团队。在2023年秋季的竞赛期间,我们部署了基于行为优化和深度强化学习(RL)的各类协作团队算法,这些算法以二对二团队形式运行于USV系统上,并相互进行测试。通过Pyquaticus测试平台实现了USV智能体的深度强化学习训练,该平台是一个轻量级健身房环境,支持在低级环境中进行模拟CTF训练。实验结果表明,在本次竞赛中,基于规则的协作行为智能体性能优于采用深度强化学习范式训练的智能体。进一步将用于强化学习的Pyquaticus健身房环境与MOOS-IvP在配置和控制架构层面进行集成,将有助于未来研究中开展更具竞争力的CTF竞赛。随着实验性深度强化学习方法的持续发展,作者预计行为自主性与深度强化学习之间的性能差距将逐步缩小。因此,本报告概述了整体竞赛、方法及结果,重点讨论了未来研究方向,包括奖励塑造、仿真到现实迁移方法,以及扩展智能体间的基于规则协作,使其能够根据人类专家的意图/规则响应安全与安保事件,从而执行安全与安保流程。