It has been found recently that more data can, counter-intuitively, hurt the performance of deep neural networks. Here, we show that a more extreme version of the phenomenon occurs in data-driven models of dynamical systems. To elucidate the underlying mechanism, we focus on next-generation reservoir computing (NGRC) -- a popular framework for learning dynamics from data. We find that, despite learning a better representation of the flow map with more training data, NGRC can adopt an ill-conditioned ``integrator'' and lose stability. We link this data-induced instability to the auxiliary dimensions created by the delayed states in NGRC. Based on these findings, we propose simple strategies to mitigate the instability, either by increasing regularization strength in tandem with data size, or by carefully introducing noise during training. Our results highlight the importance of proper regularization in data-driven modeling of dynamical systems.
翻译:近期研究发现,更多数据反而可能损害深度神经网络的性能,这一现象有悖直觉。本文证明,在数据驱动的动力系统模型中会出现更为极端的此类现象。为阐明其内在机制,我们聚焦于下一代储层计算——一种从数据中学习动力学的流行框架。研究发现,尽管更多训练数据能提升流映射的学习效果,但NGRC可能采用病态的“积分器”并丧失稳定性。我们将这种数据诱发的不稳定性与NGRC中延迟状态创建的辅助维度相关联。基于这些发现,我们提出了两种缓解不稳定性的简易策略:或是随数据规模同步增强正则化强度,或是在训练中谨慎引入噪声。本研究结果凸显了在动力系统的数据驱动建模中实施恰当正则化的重要性。