We present a framework and algorithms to learn controlled dynamics models using neural stochastic differential equations (SDEs) -- SDEs whose drift and diffusion terms are both parametrized by neural networks. We construct the drift term to leverage a priori physics knowledge as inductive bias, and we design the diffusion term to represent a distance-aware estimate of the uncertainty in the learned model's predictions -- it matches the system's underlying stochasticity when evaluated on states near those from the training dataset, and it predicts highly stochastic dynamics when evaluated on states beyond the training regime. The proposed neural SDEs can be evaluated quickly enough for use in model predictive control algorithms, or they can be used as simulators for model-based reinforcement learning. Furthermore, they make accurate predictions over long time horizons, even when trained on small datasets that cover limited regions of the state space. We demonstrate these capabilities through experiments on simulated robotic systems, as well as by using them to model and control a hexacopter's flight dynamics: A neural SDE trained using only three minutes of manually collected flight data results in a model-based control policy that accurately tracks aggressive trajectories that push the hexacopter's velocity and Euler angles to nearly double the maximum values observed in the training dataset.
翻译:我们提出了一种框架与算法,用于通过神经随机微分方程(SDEs——其漂移项和扩散项均由神经网络参数化的随机微分方程)学习受控动力学模型。在漂移项构造中,我们利用先验物理知识作为归纳偏置,而扩散项的设计则用于表示所学模型预测中不确定性的一种距离感知估计——当评估状态接近训练数据集时,它匹配系统的潜在随机性;当评估状态超出训练范围时,它预测高度随机性的动力学。所提出的神经随机微分方程可以快速评估,用于模型预测控制算法,或作为基于模型的强化学习的模拟器。此外,即使在与训练数据集覆盖有限状态空间的小数据集上训练,它们也能在长时间范围内做出准确预测。我们通过在模拟机器人系统上的实验,以及用于建模和控制六旋翼飞行器的飞行动力学,展示了这些能力:仅使用三分钟手动收集飞行数据训练的神经随机微分方程,便能产生一个基于模型的策略,该策略精确跟踪激进的轨迹,使六旋翼的速度和欧拉角几乎达到训练数据集中观测最大值的两倍。