Understanding the internal mechanisms of large language models (LLMs) remains a challenging and complex endeavor. Even fundamental questions, such as how fine-tuning affects model behavior, often require extensive empirical evaluation. In this paper, we introduce a novel perspective based on the geometric properties of contextual latent embeddings to study the effects of training and fine-tuning. To that end, we measure the local dimensions of a contextual language model's latent space and analyze their shifts during training and fine-tuning. We show that the local dimensions provide insights into the model's training dynamics and generalization ability. Specifically, the mean of the local dimensions predicts when the model's training capabilities are exhausted, as exemplified in a dialogue state tracking task, overfitting, as demonstrated in an emotion recognition task, and grokking, as illustrated with an arithmetic task. Furthermore, our experiments suggest a practical heuristic: reductions in the mean local dimension tend to accompany and predict subsequent performance gains. Through this exploration, we aim to provide practitioners with a deeper understanding of the implications of fine-tuning on embedding spaces, facilitating informed decisions when configuring models for specific applications. The results of this work contribute to the ongoing discourse on the interpretability, adaptability, and generalizability of LLMs by bridging the gap between intrinsic model mechanisms and geometric properties in the respective embeddings.
翻译:理解大型语言模型(LLMs)的内部机制仍是一项具有挑战性且复杂的任务。即使是基础性问题,例如微调如何影响模型行为,通常也需要大量的实证评估。本文提出一种基于上下文潜在嵌入几何特性的新视角,以研究训练与微调的影响。为此,我们测量了上下文语言模型潜在空间的局部维度,并分析其在训练与微调过程中的变化。研究表明,局部维度能够揭示模型的训练动态与泛化能力。具体而言,局部维度的均值可预测模型训练能力何时耗尽(如在对话状态跟踪任务中所示)、何时出现过拟合(如在情感识别任务中所示)以及何时出现顿悟现象(如在算术任务中所示)。此外,实验表明一个实用的启发式规律:局部维度均值的降低往往伴随并预示着后续性能的提升。通过这一探索,我们旨在为实践者提供对微调如何影响嵌入空间的更深入理解,从而为特定应用配置模型时做出明智决策。本研究成果通过弥合模型内在机制与相应嵌入几何特性之间的鸿沟,为LLMs的可解释性、适应性与泛化能力的持续讨论提供了新的见解。