Despite the increasing prevalence of large language models (LLMs), we still have a limited understanding of how their representational spaces are structured. This limits our ability to interpret how and what they learn or relate them to learning in humans. We argue LLMs are best seen as an instance of lossy compression, where over training they learn by retaining only information in their training data relevant to their objective(s). We show pre-training results in models that are optimally compressed for next-sequence prediction, approaching the Information Bottleneck bound on compression. Across an array of open weights models, each compresses differently, likely due to differences in the data and training recipes used. However even across different families of LLMs the optimality of a model's compression, and the information present in it, can predict downstream performance on across a wide array of benchmarks, letting us directly link representational structure to actionable insights about model performance. In the general case the work presented here offers a unified Information-Theoretic framing for how these models learn that is deployable at scale.
翻译:尽管大语言模型(LLMs)日益普及,我们对其表征空间的结构理解仍十分有限。这限制了我们解读它们如何学、学什么,以及将其与人类学习相联系的能力。我们认为,大语言模型最好被视为有损压缩的实例——在训练过程中,它们通过仅保留训练数据中与目标相关的信息来进行学习。研究表明,预训练产生的模型是针对下一序列预测进行最优压缩的模型,渐进接近信息瓶颈压缩界。在一系列开放权重模型中,各模型压缩方式各异,这很可能源于所用数据与训练配方的差异。然而,即使跨越不同LLMs家族,模型压缩的最优性及其所含信息仍可预测其在广泛基准测试中的下游性能,使我们能够将表征结构与关于模型性能的可操作洞察直接关联。在一般情况下,本文提出的工作提供了一种可大规模部署的统一信息论框架,用于理解这些模型的学习机制。