Verified Compositional Neuro-Symbolic Control for Stochastic Systems with Temporal Logic Tasks

Several methods have been proposed recently to learn neural network (NN) controllers for autonomous agents, with unknown and stochastic dynamics, tasked with complex missions captured by Linear Temporal Logic (LTL). Due to the sample-inefficiency of the majority of these works, compositional learning methods have been proposed decomposing the LTL specification into smaller sub-tasks. Then, separate controllers are learned and composed to satisfy the original task. A key challenge within these approaches is that they often lack safety guarantees or the provided guarantees are impractical. This paper aims to address this challenge. Particularly, we consider autonomous systems with unknown and stochastic dynamics and LTL-encoded tasks. We assume that the system is equipped with a finite set of base skills modeled by trained NN feedback controllers. Our goal is to check if there exists a temporal composition of the trained NN controllers - and if so, to compute it - that will yield a composite system behavior that satisfies the assigned LTL task with probability one. We propose a new approach that relies on a novel integration of automata theory and data-driven reachability analysis tools for NN-controlled stochastic systems. The resulting neuro-symbolic controller allows the agent to generate safe behaviors for unseen complex temporal logic tasks in a zero-shot fashion by leveraging its base skills. We show correctness of the proposed method and we provide conditions under which it is complete. To the best of our knowledge, this is the first work that designs verified temporal compositions of NN controllers for unknown and stochastic systems. Finally, we provide extensive numerical simulations and hardware experiments on robot navigation tasks to demonstrate the proposed method.

翻译：近期，已有多种方法被提出用于学习自主智能体的神经网络控制器，这些智能体具有未知和随机动态特性，且需完成由线性时序逻辑（LTL）捕获的复杂任务。由于这些方法大多存在样本效率低下的问题，研究者提出了组合学习方法，将LTL规范分解为较小的子任务，随后分别学习控制器并将其组合以满足原始任务需求。这类方法面临的关键挑战在于往往缺乏安全性保证，或所提供的保证不具实用性。本文旨在解决这一挑战。具体而言，我们考虑具有未知和随机动态特性的自主系统及其LTL编码任务。假设系统配备有一组由训练好的神经网络反馈控制器建模的基础技能。我们的目标是检验是否存在已训练神经网络控制器的时间组合（若存在则计算该组合），使得组合后的系统行为能够以概率1满足所分配的LTL任务。我们提出一种新方法，该方法基于自动机理论与面向神经网络控制随机系统的数据驱动可达性分析工具的创新融合。由此得到的神经符号控制器使智能体能够通过利用其基础技能，以零样本方式为未见过的复杂时序逻辑任务生成安全行为。我们证明了所提出方法的正确性，并给出了该方法完备性的条件。据我们所知，这是首个为未知和随机系统设计神经网络控制器可验证时间组合的研究。最后，我们通过机器人导航任务上的大量数值仿真和硬件实验对所提方法进行了验证。