Separation Assurance between Heterogeneous Fleets of Small Unmanned Aerial Systems via Multi-Agent Reinforcement Learning

In the envisioned future dense urban airspace, multiple companies will operate heterogeneous fleets of small unmanned aerial systems (sUASs), where each fleet includes several homogeneous aircraft with identical policies and configurations, e.g., equipage, sensing, and communication ranges, making tactical deconfliction highly complex for the aircraft. This paper aims to address two core questions: (1) Can tactical deconfliction policies converge or reach an equilibrium to ensure a conflict-free airspace when companies operate heterogeneous fleets of homogeneous aircraft? (2) If so, will the converged policies discriminate against companies operating sUASs with weaker configurations? We investigate a multi-agent reinforcement learning paradigm in which homogeneous aircraft within heterogeneous fleets operate concurrently to perform package delivery missions over Dallas, Texas, USA. An attention-enhanced Proximal Policy Optimization-based Advantage Actor-Critic (PPOA2C) framework is employed to resolve intra- and inter-fleet conflicts, with each fleet independently training its own policy while preserving privacy. Experimental results show that two fleets with distinct, shared PPOA2C policies can reach an equilibrium to maintain safe separation. While two PPOA2C policies outperform two strong rule-based baselines in terms of conflict resolution, a PPOA2C policy exhibits safer interaction with a rule-based policy, indicating adaptive capabilities of PPOA2C policies. Furthermore, we conducted extensive policy-configuration evaluations, which reveal that equilibria between similar policy types tend to favor fleets with stronger configurations. Even under similar configurations but different policy types, the equilibrium favors one of the heterogeneous policies, underscoring the need for fairness-aware conflict management in heterogeneous sUAS operations.

翻译：在展望的未来密集城市空域中，多家公司将运营异构小型无人机系统（sUAS）机队，每支机队包含多架同构飞行器，其策略与配置（如装备、感知和通信范围）相同，这使得飞行器的战术冲突化解高度复杂。本文旨在解决两个核心问题：（1）当公司运营同构飞行器的异构机队时，战术冲突化解策略能否收敛或达到均衡，以确保空域无冲突？（2）若能达到均衡，收敛后的策略是否会对配置较弱的sUAS运营公司产生歧视？我们研究了一种多智能体强化学习范式，其中异构机队内的同构飞行器在美国德克萨斯州达拉斯市同时执行包裹递送任务。采用注意力增强的近端策略优化优势行动者-评论家（PPOA2C）框架解决机队内和机队间的冲突，每支机队在保护隐私的前提下独立训练自身策略。实验结果表明，采用不同但共享的PPOA2C策略的两支机队能够达到维持安全间隔的均衡。尽管两种PPOA2C策略在冲突化解方面优于两种基于规则的强基线策略，但PPOA2C策略与基于规则策略的交互展现出更高的安全性，凸显其自适应能力。此外，我们进行了广泛的策略-配置评估，结果表明相似策略类型间的均衡往往偏向配置更强的机队；即使在配置相似但策略类型不同的情况下，均衡仍偏向某一种异构策略，这凸显了异构sUAS运行中公平性感知冲突管理的必要性。