Locating Information in Large Language Models via Random Matrix Theory

As large language models (LLMs) become central to AI applications, gaining a deeper understanding of their inner workings is increasingly important. In this work, we analyze the weight matrices of pretrained transformer models -- specifically BERT and Llama -- using random matrix theory (RMT) as a zero-information hypothesis. While randomly initialized weights perfectly agree with RMT predictions, deviations emerge after training, allowing us to locate learned structures within the models. We identify layer-type specific behaviors that are consistent across all blocks and architectures considered. By pinpointing regions that deviate from RMT predictions, we highlight areas of feature learning and confirm this through comparisons with the activation covariance matrices of the corresponding layers. Our method provides a diagnostic tool for identifying relevant regions in transformer weights using only the trained matrices. Additionally, we address the ongoing debate regarding the significance of small singular values in the context of fine-tuning and alignment in LLMs. Our findings reveal that, after fine-tuning, small singular values play a crucial role in the models' capabilities, suggesting that removing them in an already aligned transformer can be detrimental, as it may compromise model alignment.

翻译：随着大型语言模型（LLMs）在人工智能应用中占据核心地位，深入理解其内部工作机制变得日益重要。在本工作中，我们以随机矩阵理论（RMT）作为零信息假设，分析了预训练Transformer模型（具体为BERT和Llama）的权重矩阵。虽然随机初始化的权重完全符合RMT的预测，但在训练后出现了偏差，这使我们能够定位模型内部已学习到的结构。我们识别出在所有考察的模块和架构中均保持一致、且具有层类型特异性的行为。通过精确定位偏离RMT预测的区域，我们突出了特征学习的区域，并通过与相应层激活协方差矩阵的比较验证了这一点。我们的方法提供了一种仅使用训练后的矩阵即可识别Transformer权重中相关区域的诊断工具。此外，我们还探讨了当前关于LLMs微调和对齐过程中小奇异值重要性的持续争论。我们的研究结果表明，在微调后，小奇异值对模型能力起着至关重要的作用，这表明在已对齐的Transformer中移除它们可能是有害的，因为这可能会损害模型的对齐性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日