Understanding what defines a good representation in large language models (LLMs) is fundamental to both theoretical understanding and practical applications. In this paper, we investigate the quality of intermediate representations in various LLM architectures, including Transformers and State Space Models (SSMs). We find that intermediate layers often yield more informative representations for downstream tasks than the final layers. To measure the representation quality, we adapt and apply a suite of metrics - such as prompt entropy, curvature, and augmentation-invariance - originally proposed in other contexts. Our empirical study reveals significant architectural differences, how representations evolve throughout training, and how factors like input randomness and prompt length affect each layer. Notably, we observe a bimodal pattern in the entropy of some intermediate layers and consider potential explanations tied to training data. Overall, our results illuminate the internal mechanics of LLMs and guide strategies for architectural optimization and training.
翻译:理解大型语言模型(LLMs)中良好表征的定义,对于理论理解和实际应用都至关重要。本文研究了包括Transformer和状态空间模型(SSM)在内的多种LLM架构中中间层表征的质量。我们发现,对于下游任务,中间层产生的表征通常比最终层更具信息量。为了衡量表征质量,我们调整并应用了一套最初在其他背景下提出的度量指标——如提示熵、曲率和增强不变性。我们的实证研究揭示了显著的架构差异、表征在训练过程中的演变方式,以及输入随机性和提示长度等因素对各层的影响。值得注意的是,我们观察到某些中间层的熵呈现双峰模式,并考虑了与训练数据相关的潜在解释。总体而言,我们的结果阐明了LLMs的内部机制,并为架构优化和训练策略提供了指导。