We introduce a Transformer-based Reinforcement Learning framework for autonomous orbital collision avoidance that explicitly models the effects of partial observability and imperfect monitoring in space operations. The framework combines a configurable encounter simulator, a distance-dependent observation model, and a sequential state estimator to represent uncertainty in relative motion. A central contribution of this work is the use of transformer-based Partially Observable Markov Decision Process (POMDP) architecture, which leverage long-range temporal attention to interpret noisy and intermittent observations more effectively than traditional architectures. This integration provides a foundation for training collision avoidance agents that can operate more reliably under imperfect monitoring environments.
翻译:本文提出了一种基于Transformer的强化学习框架,用于自主轨道防撞,该框架显式建模了空间操作中部分可观测性和监测不完善的影响。该框架结合了可配置的遭遇模拟器、距离依赖的观测模型以及序列状态估计器,以表征相对运动中的不确定性。本工作的核心贡献在于采用了基于Transformer的部分可观测马尔可夫决策过程架构,该架构利用长程时序注意力机制,能够比传统架构更有效地解析噪声和间歇性观测数据。这一整合为训练防撞智能体奠定了基础,使其能够在监测不完善的环境中更可靠地运行。