In large language models (LLMs), each block operates on the residual stream to map input token sequences to output token distributions. However, most of the interpretability literature focuses on internal latent representations, leaving token-space dynamics underexplored. The high dimensionality and categoricity of token distributions hinder their analysis, as standard statistical descriptors are not suitable. We show that the entropy of logit-lens predictions overcomes these issues. In doing so, it provides a per-layer scalar, permutation-invariant metric. We introduce Entropy-Lens to distill the token-space dynamics of the residual stream into a low-dimensional signal. We call this signal the entropy profile. We apply our method to a variety of model sizes and families, showing that (i) entropy profiles uncover token prediction dynamics driven by expansion and pruning strategies; (ii) these dynamics are family-specific and invariant under depth rescaling; (iii) they are characteristic of task type and output format; (iv) these strategies have unequal impact on downstream performance, with the expansion strategy usually being more critical. Ultimately, our findings further enhance our understanding of the residual stream, enabling a granular assessment of how information is processed across model depth.
翻译:在大型语言模型(LLMs)中,每个模块通过对残差流的操作,将输入词元序列映射为输出词元分布。然而,现有可解释性研究大多聚焦于内部潜在表示,对词元空间的动态特性探索不足。词元分布的高维性和离散性阻碍了其分析,因为标准统计描述方法并不适用。我们证明,通过对logit-lens预测结果计算熵可以克服这些问题。该方法为每一层提供了一个标量化的、置换不变的度量指标。我们提出熵透镜方法,将残差流在词元空间的动态特性提炼为低维信号,并将其称为熵分布曲线。我们将该方法应用于不同规模和系列模型,结果表明:(i)熵分布曲线揭示了由扩展与剪枝策略驱动的词元预测动态;(ii)这些动态具有系列特异性,且在深度缩放条件下保持不变;(iii)它们对任务类型和输出格式具有表征性;(iv)这些策略对下游性能的影响不均衡,其中扩展策略通常更为关键。最终,我们的发现深化了对残差流的理解,实现了对信息在模型深度维度处理过程的细粒度评估。