Implicit layer deep learning techniques, like Neural Differential Equations, have become an important modeling framework due to their ability to adapt to new problems automatically. Training a neural differential equation is effectively a search over a space of plausible dynamical systems. However, controlling the computational cost for these models is difficult since it relies on the number of steps the adaptive solver takes. Most prior works have used higher-order methods to reduce prediction timings while greatly increasing training time or reducing both training and prediction timings by relying on specific training algorithms, which are harder to use as a drop-in replacement due to strict requirements on automatic differentiation. In this manuscript, we use internal cost heuristics of adaptive differential equation solvers at stochastic time points to guide the training toward learning a dynamical system that is easier to integrate. We "close the black-box" and allow the use of our method with any adjoint technique for gradient calculations of the differential equation solution. We perform experimental studies to compare our method to global regularization to show that we attain similar performance numbers without compromising the flexibility of implementation on ordinary differential equations (ODEs) and stochastic differential equations (SDEs). We develop two sampling strategies to trade off between performance and training time. Our method reduces the number of function evaluations to 0.556-0.733x and accelerates predictions by 1.3-2x.
翻译:隐式层深度学习技术(如神经微分方程)因其自动适应新问题的能力,已成为重要的建模框架。训练神经微分方程本质上是在可行动力系统空间中进行搜索。然而,控制这些模型的计算成本具有挑战性,因为这依赖于自适应求解器所采取的步数。以往大多数研究通过高阶方法降低预测时间(但大幅增加训练时间),或依赖特定训练算法同时降低训练与预测时间(但由于对自动微分的严格限制,难以作为即插即用的替代方案)。本文利用自适应微分方程求解器在随机时间点的内部成本启发式方法,引导训练过程学习更易积分的动力系统。我们“封闭黑箱”,使得该方法可与任意伴随技术结合用于微分方程解的梯度计算。通过实验研究,我们将该方法与全局正则化进行对比,证明在不牺牲常微分方程与随机微分方程实现灵活性的前提下,能达到相似性能。我们开发了两种采样策略以权衡性能与训练时间。该方法将函数评估次数降至0.556-0.733倍,预测速度提升1.3-2倍。