Neural-ODE parameterize a differential equation using continuous depth neural network and solve it using numerical ODE-integrator. These models offer a constant memory cost compared to models with discrete sequence of hidden layers in which memory cost increases linearly with the number of layers. In addition to memory efficiency, other benefits of neural-ode include adaptability of evaluation approach to input, and flexibility to choose numerical precision or fast training. However, despite having all these benefits, it still has some limitations. We identify the ODE-integrator (also called ODE-solver) as the weakest link in the chain as it may have stability, consistency and convergence (CCS) issues and may suffer from slower convergence or may not converge at all. We propose a first-order Nesterov's accelerated gradient (NAG) based ODE-solver which is proven to be tuned vis-a-vis CCS conditions. We empirically demonstrate the efficacy of our approach by training faster, while achieving better or comparable performance against neural-ode employing other fixed-step explicit ODE-solvers as well discrete depth models such as ResNet in three different tasks including supervised classification, density estimation, and time-series modelling.
翻译:神经ODE利用连续深度神经网络参数化微分方程,并通过数值ODE积分器进行求解。与隐藏层离散序列的模型(其内存成本随层数线性增加)相比,此类模型具有恒定的内存成本。除内存效率外,神经ODE的其他优势包括:评估方法对输入的适应性,以及选择数值精度或快速训练的灵活性。然而,尽管具备这些优点,该模型仍存在局限性。我们指出,ODE积分器(亦称ODE求解器)是链条中最薄弱的环节,可能面临稳定性、一致性和收敛性(CCS)问题,且可能收敛缓慢甚至根本不收敛。我们提出一种基于一阶Nesterov加速梯度(NAG)的ODE求解器,经证明其可根据CCS条件进行调优。通过在监督分类、密度估计和时间序列建模三项任务中的实验,我们实证了该方法的有效性:相较于采用其他固定步长显式ODE求解器的神经ODE以及如ResNet等离散深度模型,我们的方法训练更快,同时达到更优或相当的性能。