Implicit layer deep learning techniques, like Neural Differential Equations, have become an important modeling framework due to their ability to adapt to new problems automatically. Training a neural differential equation is effectively a search over a space of plausible dynamical systems. However, controlling the computational cost for these models is difficult since it relies on the number of steps the adaptive solver takes. Most prior works have used higher-order methods to reduce prediction timings while greatly increasing training time or reducing both training and prediction timings by relying on specific training algorithms, which are harder to use as a drop-in replacement due to strict requirements on automatic differentiation. In this manuscript, we use internal cost heuristics of adaptive differential equation solvers at stochastic time points to guide the training toward learning a dynamical system that is easier to integrate. We "close the black-box" and allow the use of our method with any adjoint technique for gradient calculations of the differential equation solution. We perform experimental studies to compare our method to global regularization to show that we attain similar performance numbers without compromising the flexibility of implementation on ordinary differential equations (ODEs) and stochastic differential equations (SDEs). We develop two sampling strategies to trade off between performance and training time. Our method reduces the number of function evaluations to 0.556-0.733x and accelerates predictions by 1.3-2x.
翻译:隐层深度学习技术(如神经微分方程)因能自动适应新问题而成为重要的建模框架。训练神经微分方程本质上是在可行动力系统空间中进行搜索。然而,控制这些模型的计算成本颇具挑战性,因为这依赖于自适应求解器所采用的步数。先前多数工作通过高阶方法在显著增加训练时间的同时缩短预测耗时,或依赖特定训练算法同时降低训练和预测耗时——但这些算法因对自动微分有严格限制而难以作为即插即用的替代方案。本文利用自适应微分方程求解器在随机时间点上的内部成本启发式方法,引导训练过程学习更易积分的动力系统。我们"封闭黑盒",使该方法可与任何用于微分方程解梯度计算的伴随技术兼容。通过实验研究对比全局正则化方法,我们证明该方法在不牺牲常微分方程(ODEs)与随机微分方程(SDEs)实现灵活性的前提下,达到了相近的性能指标。我们开发了两种采样策略以平衡性能与训练时间。该方法将函数评估次数降至0.556-0.733倍,并将预测速度提升1.3-2倍。