A key appeal of the recently proposed Neural Ordinary Differential Equation (ODE) framework is that it seems to provide a continuous-time extension of discrete residual neural networks. As we show herein, though, trained Neural ODE models actually depend on the specific numerical method used during training. If the trained model is supposed to be a flow generated from an ODE, it should be possible to choose another numerical solver with equal or smaller numerical error without loss of performance. We observe that if training relies on a solver with overly coarse discretization, then testing with another solver of equal or smaller numerical error results in a sharp drop in accuracy. In such cases, the combination of vector field and numerical method cannot be interpreted as a flow generated from an ODE, which arguably poses a fatal breakdown of the Neural ODE concept. We observe, however, that there exists a critical step size beyond which the training yields a valid ODE vector field. We propose a method that monitors the behavior of the ODE solver during training to adapt its step size, aiming to ensure a valid ODE without unnecessarily increasing computational cost. We verify this adaptation algorithm on a common bench mark dataset as well as a synthetic dataset.
翻译:近期提出的神经常微分方程(ODE)框架的一个关键吸引力在于,它似乎提供了离散残差神经网络的连续时间扩展。然而,我们在此证明,经过训练的神经ODE模型实际上依赖于训练期间所使用的特定数值方法。如果训练得到的模型应被视为由ODE生成的流,那么应该能够选择另一个数值误差相等或更小的数值求解器,而不会导致性能损失。我们观察到,如果训练依赖于离散化过于粗糙的求解器,那么使用数值误差相等或更小的求解器进行测试会导致精度急剧下降。在这种情况下,向量场与数值方法的组合无法被解释为由ODE生成的流,这无疑对神经ODE的概念构成了根本性破坏。然而,我们进一步观察到,存在一个临界步长,超过该步长后训练会生成有效的ODE向量场。我们提出了一种方法,在训练过程中监测ODE求解器的行为以调整其步长,旨在确保生成有效ODE的同时避免不必要的计算成本增加。我们在一个通用基准数据集以及一个合成数据集上验证了该自适应算法。