In this paper we present a layered approach for multi-agent control problem, decomposed into three stages, each building upon the results of the previous one. First, a high-level plan for a coarse abstraction of the system is computed, relying on parametric timed automata augmented with stopwatches as they allow to efficiently model simplified dynamics of such systems. In the second stage, the high-level plan, based on SMT-formulation, mainly handles the combinatorial aspects of the problem, provides a more dynamically accurate solution. These stages are collectively referred to as the SWA-SMT solver. They are correct by construction but lack a crucial feature: they cannot be executed in real time. To overcome this, we use SWA-SMT solutions as the initial training dataset for our last stage, which aims at obtaining a neural network control policy. We use reinforcement learning to train the policy, and show that the initial dataset is crucial for the overall success of the method.
翻译:本文提出了一种面向多智能体控制问题的分层方法,将其分解为三个阶段,每个阶段均基于前一阶段的结果构建。首先,利用带有秒表的参数化时间自动机(能够高效建模此类系统的简化动力学特性),为系统的粗粒度抽象计算高层规划。第二阶段基于SMT(可满足性模理论)形式化表述的高层规划,主要处理问题的组合方面,并提供动力学精度更高的解决方案。这些阶段统称为SWA-SMT求解器。该求解器可通过构造保证正确性,但缺乏关键特性:无法实时执行。为解决此问题,我们采用SWA-SMT解作为最后阶段的初始训练数据集,旨在获取神经网络控制策略。我们使用强化学习训练该策略,并证明初始数据集对该方法的整体成功至关重要。