Several methods have been proposed recently to learn neural network (NN) controllers for autonomous agents, with unknown and stochastic dynamics, tasked with complex missions captured by Linear Temporal Logic (LTL). Due to the sample-inefficiency of the majority of these works, compositional learning methods have been proposed decomposing the LTL specification into smaller sub-tasks. Then, separate controllers are learned and composed to satisfy the original task. A key challenge within these approaches is that they often lack safety guarantees or the provided guarantees are impractical. This paper aims to address this challenge. Particularly, we consider autonomous systems with unknown and stochastic dynamics and LTL-encoded tasks. We assume that the system is equipped with a finite set of base skills modeled by trained NN feedback controllers. Our goal is to check if there exists a temporal composition of the trained NN controllers - and if so, to compute it - that will yield a composite system behavior that satisfies the assigned LTL task with probability one. We propose a new approach that relies on a novel integration of automata theory and data-driven reachability analysis tools for NN-controlled stochastic systems. The resulting neuro-symbolic controller allows the agent to generate safe behaviors for unseen complex temporal logic tasks in a zero-shot fashion by leveraging its base skills. We show correctness of the proposed method and we provide conditions under which it is complete. To the best of our knowledge, this is the first work that designs verified temporal compositions of NN controllers for unknown and stochastic systems. Finally, we provide extensive numerical simulations and hardware experiments on robot navigation tasks to demonstrate the proposed method.
翻译:近期已有多种方法被提出,用于学习自主智能体的神经网络控制器,这些智能体具有未知且随机的动力学特性,并执行由线性时间逻辑(LTL)刻画的复杂任务。由于此类工作大多存在样本效率低下的问题,研究者提出了组合式学习方法,将LTL规范分解为更小的子任务,随后分别学习各控制器并将其组合,以满足原始任务需求。这些方法面临的关键挑战在于:它们往往缺乏安全保障机制,或所提出的保证在实际中难以实现。本文旨在解决这一挑战。具体而言,我们考虑具有未知随机动力学特性的自主系统及其LTL编码任务。假设系统配备一组由训练后的神经网络反馈控制器建模的基础技能。我们的目标是:判断是否存在训练后神经网络控制器的时序组合(若存在则计算该组合),使得组合后的系统行为能以概率1满足所分配的LTL任务。我们提出了一种新方法,该方法创新性地融合了自动机理论与数据驱动的可达性分析工具,专门用于神经网络控制的随机系统。由此产生的神经符号控制器使智能体能够通过利用其基础技能,以零样本学习方式为未见过的复杂时间逻辑任务生成安全行为。我们证明了所提方法的正确性,并给出了其完备性的充分条件。据我们所知,这是首个为未知随机系统设计可验证神经网络控制器时序组合的工作。最后,我们通过大量数值仿真和机器人导航硬件实验验证了所提方法的有效性。