We conceptualize the process of understanding as information compression, and propose a method for ranking large language models (LLMs) based on lossless data compression. We demonstrate the equivalence of compression length under arithmetic coding with cumulative negative log probabilities when using a large language model as a prior, that is, the pre-training phase of the model is essentially the process of learning the optimal coding length. At the same time, the evaluation metric compression ratio can be obtained without actual compression, which greatly saves overhead. In this paper, we use five large language models as priors for compression, then compare their performance on challenging natural language processing tasks, including sentence completion, question answering, and coreference resolution. Experimental results show that compression ratio and model performance are positively correlated, so it can be used as a general metric to evaluate large language models.
翻译:我们将理解过程概念化为信息压缩,并提出一种基于无损数据压缩的大语言模型(LLMs)排序方法。我们证明了在使用大语言模型作为先验时,算术编码下的压缩长度与累积负对数概率等价,即模型的预训练阶段本质上是学习最优编码长度的过程。同时,评估指标压缩率可在无需实际压缩的情况下获得,这极大节省了开销。本文使用五个大语言模型作为压缩先验,进而比较它们在具有挑战性的自然语言处理任务上的性能,包括句子补全、问答和共指消解。实验结果表明压缩率与模型性能呈正相关,因此可作为评估大语言模型的通用指标。