As space becomes more congested, on orbit inspection is an increasingly relevant activity whether to observe a defunct satellite for planning repairs or to de-orbit it. However, the task of on orbit inspection itself is challenging, typically requiring the careful coordination of multiple observer satellites. This is complicated by a highly nonlinear environment where the target may be unknown or moving unpredictably without time for continuous command and control from the ground. There is a need for autonomous, robust, decentralized solutions to the inspection task. To achieve this, we consider a hierarchical, learned approach for the decentralized planning of multi-agent inspection of a tumbling target. Our solution consists of two components: a viewpoint or high-level planner trained using deep reinforcement learning and a navigation planner handling point-to-point navigation between pre-specified viewpoints. We present a novel problem formulation and methodology that is suitable not only to reinforcement learning-derived robust policies, but extendable to unknown target geometries and higher fidelity information theoretic objectives received directly from sensor inputs. Operating under limited information, our trained multi-agent high-level policies successfully contextualize information within the global hierarchical environment and are correspondingly able to inspect over 90% of non-convex tumbling targets, even in the absence of additional agent attitude control.
翻译:随着太空日益拥挤,在轨巡检已成为一项日益重要的活动,无论是为规划维修而观测失效卫星,还是使其脱离轨道。然而,在轨巡检任务本身极具挑战性,通常需要多颗观测卫星的精心协调。高度非线性的环境使这一任务更加复杂:目标可能未知,或运动方式难以预测,且无法依赖地面持续遥控。因此,亟需自主、鲁棒、分布式的巡检解决方案。为实现这一目标,我们提出一种分层学习方法,用于翻滚目标的分布式多智能体巡检规划。我们的解决方案包含两个组成部分:一个使用深度强化学习训练的视点(即高层规划器),以及一个处理预设视点间点对点导航的导航规划器。我们提出了一种新颖的问题建模和方法论,不仅适用于强化学习衍生的鲁棒策略,还可扩展至未知目标几何形状,以及直接从传感器输入获取的高保真信息论目标函数。在有限信息条件下,我们训练的多智能体高层策略成功地将信息全局分层环境中的上下文进行整合,即使在没有额外智能体姿态控制的情况下,也能巡检超过90%的非凸翻滚目标。