Guaranteed Conformance of Neurosymbolic Models to Natural Constraints

Deep neural networks have emerged as the workhorse for a large section of robotics and control applications, especially as models for dynamical systems. Such data-driven models are in turn used for designing and verifying autonomous systems. They are particularly useful in modeling medical systems where data can be leveraged to individualize treatment. In safety-critical applications, it is important that the data-driven model is conformant to established knowledge from the natural sciences. Such knowledge is often available or can often be distilled into a (possibly black-box) model. For instance, an F1 racing car should conform to Newton's laws (which are encoded within a unicycle model). In this light, we consider the following problem - given a model $M$ and a state transition dataset, we wish to best approximate the system model while being a bounded distance away from $M$. We propose a method to guarantee this conformance. Our first step is to distill the dataset into a few representative samples called memories, using the idea of a growing neural gas. Next, using these memories we partition the state space into disjoint subsets and compute bounds that should be respected by the neural network in each subset. This serves as a symbolic wrapper for guaranteed conformance. We argue theoretically that this only leads to a bounded increase in approximation error; which can be controlled by increasing the number of memories. We experimentally show that on three case studies (Car Model, Drones, and Artificial Pancreas), our constrained neurosymbolic models conform to specified models (each encoding various constraints) with order-of-magnitude improvements compared to the augmented Lagrangian and vanilla training methods. Our code can be found at: https://github.com/kaustubhsridhar/Constrained_Models

翻译：深度神经网络已成为机器人与控制应用中大量任务的关键工具，尤其作为动态系统模型。此类数据驱动模型随后被用于自主系统的设计与验证，在可通过数据实现个性化治疗的医疗建模中尤为重要。在安全关键型应用中，确保数据驱动模型符合自然科学领域的既有知识至关重要。此类知识通常可用，或可被提炼为（可能为黑箱的）模型。例如，F1赛车应符合牛顿定律（该定律已编码于单轮模型中）。基于此，我们考虑以下问题：给定模型$M$与状态转移数据集，我们希望最优逼近系统模型，同时与$M$保持有界距离。我们提出了一种保证这种一致性的方法。首先，利用生长型神经气体的思想，将数据集提炼为少量代表性样本（称为记忆）。其次，基于这些记忆将状态空间划分为不相交子集，并计算每个子集中神经网络应遵循的界限。这构成了保证一致性的符号包装。我们从理论上证明，该方法仅导致逼近误差的有界增加，且可通过增加记忆数量进行控制。实验表明，在三个案例研究（汽车模型、无人机、人工胰腺）中，与增广拉格朗日及原始训练方法相比，我们的约束神经符号模型以数量级提升的改进幅度，实现了对指定模型（各自编码不同约束）的一致性。代码见：https://github.com/kaustubhsridhar/Constrained_Models