We explore the training dynamics of neural networks in a structured non-IID setting where documents are presented cyclically in a fixed, repeated sequence. Typically, networks suffer from catastrophic interference when training on a sequence of documents; however, we discover a curious and remarkable property of LLMs fine-tuned sequentially in this setting: they exhibit anticipatory behavior, recovering from the forgetting on documents before encountering them again. The behavior emerges and becomes more robust as the architecture scales up its number of parameters. Through comprehensive experiments and visualizations, we uncover new insights into training over-parameterized networks in structured environments.
翻译:我们探索了神经网络在一种结构化非独立同分布环境中的训练动态,其中文档以固定的重复序列循环呈现。通常,网络在序列化文档训练中会遭受灾难性干扰;然而,我们在此设置下对顺序微调的大语言模型发现了一个有趣且显著的特性:它们表现出预期性行为,即在再次遇到文档之前,便能从对该文档的遗忘中恢复过来。随着架构参数规模的扩展,这种行为会逐步显现并变得更加稳健。通过全面的实验与可视化分析,我们揭示了在结构化环境中训练过参数化网络的新见解。