Several methods have been proposed recently to learn neural network (NN) controllers for autonomous agents, with unknown and stochastic dynamics, tasked with complex missions captured by Linear Temporal Logic (LTL). Due to the sample-inefficiency of the majority of these works, compositional learning methods have been proposed decomposing the LTL specification into smaller sub-tasks. Then, separate controllers are learned and composed to satisfy the original task. A key challenge within these approaches is that they often lack safety guarantees or the provided guarantees are impractical. This paper aims to address this challenge. Particularly, we consider autonomous systems with unknown and stochastic dynamics and LTL-encoded tasks. We assume that the system is equipped with a finite set of base skills modeled by trained NN feedback controllers. Our goal is to check if there exists a temporal composition of the trained NN controllers - and if so, to compute it - that will yield a composite system behavior that satisfies the assigned LTL task with probability one. We propose a new approach that relies on a novel integration of automata theory and data-driven reachability analysis tools for NN-controlled stochastic systems. The resulting neuro-symbolic controller allows the agent to generate safe behaviors for unseen complex temporal logic tasks in a zero-shot fashion by leveraging its base skills. We show correctness of the proposed method and we provide conditions under which it is complete. To the best of our knowledge, this is the first work that designs verified temporal compositions of NN controllers for unknown and stochastic systems. Finally, we provide extensive numerical simulations and hardware experiments on robot navigation tasks to demonstrate the proposed method.
翻译:近期,多项研究提出了针对自主智能体的神经网络控制器学习方法,这些智能体具有未知且随机的动态特性,需执行由线性时序逻辑(LTL)刻画的复杂任务。由于现有方法普遍存在样本效率低下的问题,研究者提出了组合式学习方法,将原始LTL规范分解为若干更小的子任务,分别学习各子任务的控制器后通过组合满足原始任务需求。这类方法面临的核心挑战在于:所获得的控制器往往缺乏安全性保障,或所提供的保障不具实际可行性。本文旨在解决这一挑战。具体而言,我们考虑具有未知随机动态特性且需执行LTL编码任务的自主系统。假设系统配备有限个基础技能,每个技能由经过训练的神经网络反馈控制器建模。我们的目标是:检验是否存在经训练的神经网络控制器的时间组合方案(若存在则计算该方案),使得组合后的系统行为能以概率1满足指定的LTL任务。我们提出一种新方法,该方法创新性地将自动机理论与面向神经网络控制随机系统的数据驱动可达性分析工具相结合。由此生成的神经符号控制器使智能体能够利用其基础技能,以零样本学习方式为未知复杂时序逻辑任务生成安全行为。我们证明了所提方法的正确性,并给出了保证其完备性的条件。据我们所知,这是首个针对未知随机系统设计神经网络控制器已验证时间组合方案的研究。最后,我们通过大量数值仿真与机器人导航硬件实验验证了所提方法的有效性。