We present a framework and algorithms to learn controlled dynamics models using neural stochastic differential equations (SDEs) -- SDEs whose drift and diffusion terms are both parametrized by neural networks. We construct the drift term to leverage a priori physics knowledge as inductive bias, and we design the diffusion term to represent a distance-aware estimate of the uncertainty in the learned model's predictions -- it matches the system's underlying stochasticity when evaluated on states near those from the training dataset, and it predicts highly stochastic dynamics when evaluated on states beyond the training regime. The proposed neural SDEs can be evaluated quickly enough for use in model predictive control algorithms, or they can be used as simulators for model-based reinforcement learning. Furthermore, they make accurate predictions over long time horizons, even when trained on small datasets that cover limited regions of the state space. We demonstrate these capabilities through experiments on simulated robotic systems, as well as by using them to model and control a hexacopter's flight dynamics: A neural SDE trained using only three minutes of manually collected flight data results in a model-based control policy that accurately tracks aggressive trajectories that push the hexacopter's velocity and Euler angles to nearly double the maximum values observed in the training dataset.
翻译:我们提出了一种框架与算法,利用神经随机微分方程(SDE)——其漂移项和扩散项均由神经网络参数化——学习受控动力学模型。漂移项的设计以先验物理知识作为归纳偏置,而扩散项则表征学习模型预测中距离感知的不确定性估计:当评估靠近训练数据集的状态时,模型匹配系统的潜在随机性;当评估训练范围之外的状态时,则预测高度随机动力学。所提出的神经SDE可快速计算以用于模型预测控制算法,或作为基于模型的强化学习仿真器。此外,即使在小规模数据集(覆盖状态空间的有限区域)上训练,它们也能在长时间尺度上做出准确预测。我们通过仿真机器人系统实验,以及将其用于建模和控制六旋翼飞行器动力学来展示这些能力:仅使用三分钟手动采集的飞行数据训练的神经SDE,便生成了一种基于模型的策略,该策略能精确跟踪激进轨迹——这些轨迹将六旋翼飞行器的速度和欧拉角推向训练数据集中最大观测值的近两倍。