We investigate how formal temporal logic specifications can enhance the safety and robustness of reinforcement learning (RL) control in aerospace applications. Using the open source AeroBench F-16 simulation benchmark, we train a Proximal Policy Optimization (PPO) agent to regulate engine throttle and track commanded airspeed. The control objective is encoded as a Signal Temporal Logic (STL) requirement to maintain airspeed within a prescribed band during the final seconds of each maneuver. To enforce this specification at run time, we introduce a conformal STL shield that filters the RL agent's actions using online conformal prediction. We compare three settings: (i) PPO baseline, (ii) PPO with a classical rule-based STL shield, and (iii) PPO with the proposed conformal shield, under both nominal conditions and a severe stress scenario involving aerodynamic model mismatch, actuator rate limits, measurement noise, and mid-episode setpoint jumps. Experiments show that the conformal shield preserves STL satisfaction while maintaining near baseline performance and providing stronger robustness guarantees than the classical shield. These results demonstrate that combining formal specification monitoring with data driven RL control can substantially improve the reliability of autonomous flight control in challenging environments.
翻译:本研究探讨了形式化时序逻辑规约如何增强航空航天应用中强化学习(RL)控制的安全性与鲁棒性。基于开源AeroBench F-16仿真基准平台,我们训练了近端策略优化(PPO)智能体以调节发动机油门并跟踪指令空速。控制目标被编码为信号时序逻辑(STL)规约,要求在每个机动动作的最后数秒内将空速维持在预设区间。为实现该规约的运行时强制遵循,我们提出了一种保形STL防护机制,该机制通过在线保形预测对RL智能体的动作进行筛选。我们在以下三种设置下进行对比实验:(i)PPO基准方案,(ii)配备经典基于规则的STL防护机制的PPO方案,以及(iii)配备所提出的保形防护机制的PPO方案,测试环境包括标称条件及包含气动模型失配、执行器速率限制、测量噪声和任务中段设定点跳变的严苛应力场景。实验表明,保形防护机制在保持接近基准性能的同时,能够维持STL规约满足性,且比经典防护机制提供更强的鲁棒性保证。这些结果证明,将形式化规约监测与数据驱动的RL控制相结合,能显著提升自主飞行控制在挑战性环境中的可靠性。