Autonomous spacecraft rendezvous and proximity operations (RPO) require controllers that guarantee safety under thrust constraints while minimizing fuel expenditure. Input-constrained control barrier functions (ICCBFs) provide a control method for nonlinear systems with actuation constraints that construct a forward-invariant safe set. Previous work has shown that learning class-$\mathcal{K}$ functions defining the ICCBF recursion via meta reinforcement learning (meta-RL) yields a robust, non-greedy approach to safety-critical control in RPO. This paper extends that framework further by investigating the performance of three recurrent network architectures (Long Short Term Memory (LSTM), Gated Recurrent Unit (GRU), Selective State Space Model (Mamba)) and two training algorithms (Proximal Policy Optimization (PPO) and Soft Actor Critic (SAC)) to identify the best setup for tuning ICCBF class-K functions via meta-RL. In addition to cooperative test cases, performance is evaluated in the presence of adversarial behavior where the target spacecraft behaves in a way that worsens the safety of the chaser spacecraft. Results indicate that state space models such as Mamba when used with PPO achieve superior task completion, safety, and fuel-savings compared to other architectures, across all cooperative and uncooperative scenarios tested.
翻译:自主航天器交会与接近操作(RPO)需要能在推力约束下保证安全的同时最小化燃料消耗的控制器。输入约束控制屏障函数(ICCBF)为具有驱动约束的非线性系统提供了一种构建前向不变安全集的控制方法。先前研究表明,通过元强化学习(meta-RL)学习定义ICCBF递归的$\mathcal{K}$类函数,可得出一种在RPO中进行安全关键控制的鲁棒非贪婪方法。本文进一步扩展该框架,通过研究三种循环网络架构(长短期记忆网络(LSTM)、门控循环单元(GRU)、选择性状态空间模型(Mamba))和两种训练算法(近端策略优化(PPO)与软演员-评论家(SAC))的性能,确定通过元强化学习调整ICCBF的$\mathcal{K}$类函数的最优配置。除合作测试案例外,本文还在目标航天器采取恶化追踪航天器安全性的对抗行为场景下评估性能。结果表明,在全部合作与非合作测试场景中,与PPO结合使用的状态空间模型(如Mamba)相较于其他架构在任务完成度、安全性和燃料节约方面均表现更优。