Memory-Efficient Meta-Reinforcement Learning for Adaptive Safety-Critical Control in Adversarial Spacecraft Proximity Operations

Autonomous spacecraft rendezvous and proximity operations (RPO) require controllers that guarantee safety under thrust constraints while minimizing fuel expenditure. Input-constrained control barrier functions (ICCBFs) provide a control method for nonlinear systems with actuation constraints that construct a forward-invariant safe set. Previous work has shown that learning class-$\mathcal{K}$ functions defining the ICCBF recursion via meta reinforcement learning (meta-RL) yields a robust, non-greedy approach to safety-critical control in RPO. This paper extends that framework further by investigating the performance of three recurrent network architectures (Long Short Term Memory (LSTM), Gated Recurrent Unit (GRU), Selective State Space Model (Mamba)) and two training algorithms (Proximal Policy Optimization (PPO) and Soft Actor Critic (SAC)) to identify the best setup for tuning ICCBF class-K functions via meta-RL. In addition to cooperative test cases, performance is evaluated in the presence of adversarial behavior where the target spacecraft behaves in a way that worsens the safety of the chaser spacecraft. Results indicate that state space models such as Mamba when used with PPO achieve superior task completion, safety, and fuel-savings compared to other architectures, across all cooperative and uncooperative scenarios tested.

翻译：自主航天器交会与接近操作（RPO）需要能在推力约束下保证安全的同时最小化燃料消耗的控制器。输入约束控制屏障函数（ICCBF）为具有驱动约束的非线性系统提供了一种构建前向不变安全集的控制方法。先前研究表明，通过元强化学习（meta-RL）学习定义ICCBF递归的$\mathcal{K}$类函数，可得出一种在RPO中进行安全关键控制的鲁棒非贪婪方法。本文进一步扩展该框架，通过研究三种循环网络架构（长短期记忆网络（LSTM）、门控循环单元（GRU）、选择性状态空间模型（Mamba））和两种训练算法（近端策略优化（PPO）与软演员-评论家（SAC））的性能，确定通过元强化学习调整ICCBF的$\mathcal{K}$类函数的最优配置。除合作测试案例外，本文还在目标航天器采取恶化追踪航天器安全性的对抗行为场景下评估性能。结果表明，在全部合作与非合作测试场景中，与PPO结合使用的状态空间模型（如Mamba）相较于其他架构在任务完成度、安全性和燃料节约方面均表现更优。

相关内容

元强化学习

关注 33

Meta RL（Meta Reinforcement Learning）是Meta Learning应用到Reinforcement Learning的一个研究方向，核心的想法就是希望AI在学习大量的RL任务中获取足够的先验知识Prior Knowledge然后在面对新的RL任务时能够学的更快，学的更好，能够自适应新环境！

【博士论文】基于信息论探索的强化学习与控制：安全性、最优性及其应用研究

专知会员服务

15+阅读 · 5月9日

《面向巡飞弹药系统的情境感知深度强化学习自主非线性机动控制》

专知会员服务

15+阅读 · 4月24日

【博士论文】重新审视机器人安全性：面向真实世界自主运行的自适应与可扩展方法

专知会员服务

12+阅读 · 2月25日

航天器非脆弱控制理论及应用研究进展

专知会员服务

10+阅读 · 2025年7月8日