Implicit layer deep learning techniques, like Neural Differential Equations, have become an important modeling framework due to their ability to adapt to new problems automatically. Training a neural differential equation is effectively a search over a space of plausible dynamical systems. However, controlling the computational cost for these models is difficult since it relies on the number of steps the adaptive solver takes. Most prior works have used higher-order methods to reduce prediction timings while greatly increasing training time or reducing both training and prediction timings by relying on specific training algorithms, which are harder to use as a drop-in replacement due to strict requirements on automatic differentiation. In this manuscript, we use internal cost heuristics of adaptive differential equation solvers at stochastic time points to guide the training toward learning a dynamical system that is easier to integrate. We "close the black-box" and allow the use of our method with any adjoint technique for gradient calculations of the differential equation solution. We perform experimental studies to compare our method to global regularization to show that we attain similar performance numbers without compromising the flexibility of implementation on ordinary differential equations (ODEs) and stochastic differential equations (SDEs). We develop two sampling strategies to trade off between performance and training time. Our method reduces the number of function evaluations to 0.556-0.733x and accelerates predictions by 1.3-2x.
翻译:隐层深度学习技术(如神经微分方程)因其能够自动适应新问题而成为重要的建模框架。训练神经微分方程本质上是在合理的动力学系统空间中进行搜索。然而,控制这些模型的计算成本非常困难,因为这取决于自适应求解器所采用的步数。以往大多数研究使用高阶方法来减少预测时间,但会显著增加训练时间;或者依赖特定训练算法来同时减少训练和预测时间,但由于对自动微分的严格要求,这些算法难以作为直接替代方案使用。在本文中,我们利用自适应微分方程求解器在随机时间点的内部成本启发式方法,引导训练过程学习更易积分的动力学系统。我们“封闭黑箱”,允许所提方法与任何用于微分方程解梯度计算的伴随技术结合使用。我们通过实验研究将所提方法与全局正则化进行对比,结果表明在无需牺牲常微分方程(ODEs)和随机微分方程(SDEs)实现灵活性的情况下,我们达到了相似的性能指标。我们开发了两种采样策略来权衡性能与训练时间。所提方法将函数评估次数减少至0.556-0.733倍,并将预测速度提升1.3-2倍。