Conventional optimization-based metering depends on strict adherence to precomputed schedules, which limits the flexibility required for the stochastic operations of Advanced Air Mobility (AAM). In contrast, multi-agent reinforcement learning (MARL) offers a decentralized, adaptive framework that can better handle uncertainty, required for safe aircraft separation assurance. Despite this advantage, current MARL approaches often overfit to specific airspace structures, limiting their adaptability to new configurations. To improve generalization, we recast the MARL problem in a relative polar state space and train a transformer encoder model across diverse traffic patterns and intersection angles. The learned model provides speed advisories to resolve conflicts while maintaining aircraft near their desired cruising speeds. In our experiments, we evaluated encoder depths of 1, 2, and 3 layers in both structured and unstructured airspaces, and found that a single encoder configuration outperformed deeper variants, yielding near-zero near mid-air collision rates and shorter loss-of-separation infringements than the deeper configurations. Additionally, we showed that the same configuration outperforms a baseline model designed purely with attention. Together, our results suggest that the newly formulated state representation, novel design of neural network architecture, and proposed training strategy provide an adaptable and scalable decentralized solution for aircraft separation assurance in both structured and unstructured airspaces.
翻译:传统的基于优化的流量管理依赖于对预先计算的时间表的严格遵循,这限制了先进空中交通(AAM)随机运行所需的灵活性。相比之下,多智能体强化学习(MARL)提供了一种去中心化、自适应的框架,能更好地处理不确定性,而这正是安全飞机间隔保障所必需的。尽管有此优势,当前的MARL方法常常过度拟合特定的空域结构,限制了其对新配置的适应性。为提升泛化能力,我们在相对极坐标状态空间中重新构建了MARL问题,并针对多种交通模式和交叉角度训练了一个Transformer编码器模型。学习到的模型提供速度建议以解决冲突,同时使飞机保持在接近其期望巡航速度的状态。在我们的实验中,我们在结构化和非结构化空域中评估了1、2和3层的编码器深度,发现单一编码器配置优于更深层的变体,相较于更深层的配置,产生了接近零的近空中碰撞率和更短的间隔丧失违规时间。此外,我们证明了相同配置优于纯粹基于注意力设计的基线模型。总之,我们的结果表明,新构建的状态表示、新颖的神经网络架构设计以及提出的训练策略,为结构化和非结构化空域中的飞机间隔保障提供了一种适应性强且可扩展的去中心化解决方案。