Learning dynamics, which describes how the learning of specific training examples influences the model's prediction of other examples, give us a powerful tool for understanding the behavior of deep learning systems. We study the learning dynamics of large language models during finetuning, by analyzing the step-wise decomposition and accumulated influence among different responses. Our framework allows a uniform interpretation of many interesting observations about the training of popular algorithms for both instruction tuning and preference tuning. The analysis not only explains where the benefits of these methods come from but also inspires a simple, effective method to further improve the alignment performance. Code for experiments is available at https://github.com/Joshua-Ren/Learning_dynamics_LLM.
翻译:学习动态描述了特定训练样本的学习如何影响模型对其他样本的预测,为我们理解深度学习系统行为提供了有力工具。本研究通过分析不同响应间的逐阶分解与累积影响,探究大语言模型在微调过程中的学习动态。我们的框架能够统一解释指令微调与偏好微调中多种流行算法训练过程中的有趣现象。该分析不仅揭示了这些方法优势的来源,还启发了一种简单有效的技术以进一步提升对齐性能。实验代码发布于 https://github.com/Joshua-Ren/Learning_dynamics_LLM。