Efficient, Accurate and Stable Gradients for Neural ODEs

Neural ODEs are a recently developed model class that combine the strong model priors of differential equations with the high-capacity function approximation of neural networks. One advantage of Neural ODEs is the potential for memory-efficient training via the continuous adjoint method. However, memory-efficient training comes at the cost of approximate gradients. Therefore, in practice, gradients are often obtained by simply backpropagating through the internal operations of the forward ODE solve - incurring high memory cost. Interestingly, it is possible to construct algebraically reversible ODE solvers that allow for both exact gradients and the memory-efficiency of the continuous adjoint method. Unfortunately, current reversible solvers are low-order and suffer from poor numerical stability. The use of these methods in practice is therefore limited. In this work, we present a class of algebraically reversible solvers that are both high-order and numerically stable. Moreover, any explicit numerical scheme can be made reversible by our method. This construction naturally extends to numerical schemes for Neural CDEs and SDEs.

翻译：神经常微分方程是近期发展的模型类别，它将微分方程的强模型先验与神经网络的高容量函数逼近能力相结合。神经常微分方程的一个优势在于可通过连续伴随方法实现内存高效的训练。然而，内存高效训练以梯度近似为代价。因此在实际应用中，梯度通常通过直接对前向常微分方程求解的内部操作进行反向传播获得——这会带来高昂的内存开销。有趣的是，可以构建代数可逆的常微分方程求解器，使其既能获得精确梯度，又能保持连续伴随方法的内存效率。遗憾的是，现有的可逆求解器阶数较低且数值稳定性较差，因此在实际应用中的使用受到限制。本研究提出了一类兼具高阶特性与数值稳定性的代数可逆求解器。此外，任何显式数值格式均可通过我们的方法实现可逆化。该构建方法可自然扩展到神经随机微分方程与随机微分方程的数值格式中。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日