Unrolling training trajectories over time strongly influences the inference accuracy of neural network-augmented physics simulators. We analyze these effects by studying three variants of training neural networks on discrete ground truth trajectories. In addition to commonly used one-step setups and fully differentiable unrolling, we include a third, less widely used variant: unrolling without temporal gradients. Comparing networks trained with these three modalities makes it possible to disentangle the two dominant effects of unrolling, training distribution shift and long-term gradients. We present a detailed study across physical systems, network sizes, network architectures, training setups, and test scenarios. It provides an empirical basis for our main findings: A non-differentiable but unrolled training setup supported by a numerical solver can yield 4.5-fold improvements over a fully differentiable prediction setup that does not utilize this solver. We also quantify a difference in the accuracy of models trained in a fully differentiable setup compared to their non-differentiable counterparts. While differentiable setups perform best, the accuracy of unrolling without temporal gradients comes comparatively close. Furthermore, we empirically show that these behaviors are invariant to changes in the underlying physical system, the network architecture and size, and the numerical scheme. These results motivate integrating non-differentiable numerical simulators into training setups even if full differentiability is unavailable. We also observe that the convergence rate of common neural architectures is low compared to numerical algorithms. This encourages the use of hybrid approaches combining neural and numerical algorithms to utilize the benefits of both.
翻译:将训练轨迹随时间展开,对神经网络增强的物理模拟器的推理精度有显著影响。我们通过研究在离散真实轨迹上训练神经网络的三种变体来分析这些效应。除了常用的一步设置和完全可微的展开方法外,我们还引入第三种较少使用的变体:无时间梯度的展开。比较这三种模式训练的网络,可以区分出展开的两大主导效应:训练分布偏移和长期梯度。我们针对物理系统、网络规模、网络架构、训练设置和测试场景进行了详细研究,为我们的主要发现提供了实证基础:由数值求解器支持但不可微的展开训练设置,其性能可比未利用该求解器的完全可微预测设置提升4.5倍。我们还量化了完全可微设置与不可微对应模型在训练精度上的差异。虽然可微设置表现最佳,但无时间梯度的展开精度与之相当接近。此外,我们通过实验证明,这些行为对底层物理系统、网络架构与规模以及数值格式的变化具有不变性。这些结果表明,即使无法实现完全可微,也应在训练设置中集成不可微的数值模拟器。我们还观察到,常见神经架构的收敛速度低于数值算法。这鼓励采用结合神经与数值算法的混合方法,以充分利用两者的优势。