Provably Correct Training of Neural Network Controllers Using Reachability Analysis

In this paper, we consider the problem of training neural network (NN) controllers for nonlinear dynamical systems that are guaranteed to satisfy safety and liveness (e.g., reach-avoid) properties. Our approach is to combine model-based design methodologies for dynamical systems with data-driven approaches to achieve this target. We confine our attention to NNs with Rectifier Linear Unit (ReLU) nonlinearity which are known to represent Continuous Piece-Wise Affine (CPWA) functions. Given a mathematical model of the dynamical system, we compute a finite-state abstract model that captures the closed-loop behavior under all possible CPWA controllers. Using this finite-state abstract model, our framework identifies a family of CPWA functions guaranteed to satisfy the safety requirements. We augment the learning algorithm with a NN weight projection operator during training that enforces the resulting NN to represent a CPWA function from the provably safe family of CPWA functions. Moreover, the proposed framework uses the finite-state abstract model to identify candidate CPWA functions that may satisfy the liveness properties. Using such candidate CPWA functions, the proposed framework biases the NN training to achieve the liveness specification. We show the efficacy of the proposed framework both in simulation and on an actual robotic vehicle.

翻译：在本文中,我们考虑了对非线性动态系统神经网络控制员进行培训的问题,这些神经网络控制员能够保证满足安全和活性(例如,达到避免)特性。我们的方法是将动态系统的模型设计方法与数据驱动的方法结合起来,以实现这一目标。我们把注意力局限在使用校正式线性单位(ReLU)的神经网络控制员(NNN)的不线性上,众所周知,后者代表着连续的Plac-Wise Affine(CPWA)功能。根据动态系统的数学模型,我们计算了一个有限状态抽象模型,在CPWA控制员的所有可能情况下捕捉到闭闭路行为。我们的框架是将动态系统的模型设计方法与数据驱动方法结合起来,以数据驱动实现这一目标。我们在培训过程中与NCWA的重量预测操作员一起加强学习算法,以代表CPWA的功能来自CPWA功能中可被确认的安全家庭功能。此外,拟议的框架使用有限状态抽象模型来确定候选人CPWA功能,以所有可能的CPWA控制员之间的封闭性行为。我们用这种模拟模型框架来达到现实性标准性标准。我们用这种候选人的模化工具的模化工具。