We investigate how formal temporal logic specifications can enhance the safety and robustness of reinforcement learning (RL) control in aerospace applications. Using the open source AeroBench F-16 simulation benchmark, we train a Proximal Policy Optimization (PPO) agent to regulate engine throttle and track commanded airspeed. The control objective is encoded as a Signal Temporal Logic (STL) requirement to maintain airspeed within a prescribed band during the final seconds of each maneuver. To enforce this specification at run time, we introduce a conformal STL shield that filters the RL agent's actions using online conformal prediction. We compare three settings: (i) PPO baseline, (ii) PPO with a classical rule-based STL shield, and (iii) PPO with the proposed conformal shield, under both nominal conditions and a severe stress scenario involving aerodynamic model mismatch, actuator rate limits, measurement noise, and mid-episode setpoint jumps. Experiments show that the conformal shield preserves STL satisfaction while maintaining near baseline performance and providing stronger robustness guarantees than the classical shield. These results demonstrate that combining formal specification monitoring with data driven RL control can substantially improve the reliability of autonomous flight control in challenging environments.
翻译:本研究探讨了形式化时序逻辑规范如何增强航空航天应用中强化学习控制的安全性与鲁棒性。基于开源AeroBench F-16仿真基准平台,我们训练了近端策略优化智能体以调节发动机油门并跟踪指令空速。控制目标被编码为信号时序逻辑要求:在每个机动动作的最后数秒内将空速维持在预设区间内。为实现运行时规范执行,我们提出一种保形STL防护机制,该机制利用在线保形预测对强化学习智能体的动作进行过滤。我们在以下三种设置下进行对比实验:(i)PPO基准方案,(ii)采用经典基于规则的STL防护机制的PPO方案,以及(iii)采用所提保形防护机制的PPO方案,测试环境包括标称条件和包含气动模型失配、执行器速率限制、测量噪声及任务中段设定点跳变的严苛应力场景。实验表明,保形防护机制在保持接近基准性能的同时,能够维持STL规范满足度,且比经典防护机制提供更强的鲁棒性保证。这些结果证明,将形式化规范监测与数据驱动的强化学习控制相结合,可显著提升自主飞行控制在挑战性环境中的可靠性。