It has been found recently that more data can, counter-intuitively, hurt the performance of deep neural networks. Here, we show that a more extreme version of the phenomenon occurs in data-driven models of dynamical systems. To elucidate the underlying mechanism, we focus on next-generation reservoir computing (NGRC) -- a popular framework for learning dynamics from data. We find that, despite learning a better representation of the flow map with more training data, NGRC can adopt an ill-conditioned ``integrator'' and lose stability. We link this data-induced instability to the auxiliary dimensions created by the delayed states in NGRC. Based on these findings, we propose simple strategies to mitigate the instability, either by increasing regularization strength in tandem with data size, or by carefully introducing noise during training. Our results highlight the importance of proper regularization in data-driven modeling of dynamical systems.
翻译:近期研究发现,更多数据反而可能损害深度神经网络的性能,这一现象有悖直觉。本文表明,在数据驱动的动力系统模型中,存在一种更为极端的此类现象。为阐明其内在机制,我们聚焦于下一代储层计算(NGRC)——一种从数据中学习动力学的流行框架。我们发现,尽管NGRC能够利用更多训练数据学习到更优的流映射表示,但它可能采用病态的“积分器”并丧失稳定性。我们将这种数据诱发的不稳定性与NGRC中由延迟状态创建的辅助维度联系起来。基于这些发现,我们提出了缓解不稳定性的简单策略:或通过随数据规模同步增强正则化强度,或通过在训练中谨慎引入噪声。我们的研究结果凸显了在数据驱动的动力系统建模中实施适当正则化的重要性。