There is a belief that learning to compress well will lead to intelligence. Recently, language modeling has been shown to be equivalent to compression, which offers a compelling rationale for the success of large language models (LLMs): the development of more advanced language models is essentially enhancing compression which facilitates intelligence. Despite such appealing discussions, little empirical evidence is present for the interplay between compression and intelligence. In this work, we examine their relationship in the context of LLMs, treating LLMs as data compressors. Given the abstract concept of "intelligence", we adopt the average downstream benchmark scores as a surrogate, specifically targeting intelligence related to knowledge and commonsense, coding, and mathematical reasoning. Across 12 benchmarks, our study brings together 30 public LLMs that originate from diverse organizations. Remarkably, we find that LLMs' intelligence -- reflected by average benchmark scores -- almost linearly correlates with their ability to compress external text corpora. These results provide concrete evidence supporting the belief that superior compression indicates greater intelligence. Furthermore, our findings suggest that compression efficiency, as an unsupervised metric derived from raw text corpora, serves as a reliable evaluation measure that is linearly associated with the model capabilities. We open-source our compression datasets as well as our data collection pipelines to facilitate future researchers to assess compression properly.
翻译:有一种信念认为,学会良好压缩将通向智能。近期研究表明,语言建模等价于压缩,这为大语言模型(LLMs)的成功提供了令人信服的理论依据:开发更先进的语言模型本质上是在增强压缩能力,从而促进智能发展。尽管此类讨论颇具吸引力,但压缩与智能之间相互作用的实证证据仍然匮乏。本研究聚焦大语言模型,将其视为数据压缩器,深入探究二者关系。针对"智能"这一抽象概念,我们采用下游任务平均基准分数作为替代指标,特别关注知识常识、代码生成及数学推理相关的智能维度。通过整合30个来自不同机构的公开大语言模型,我们在12项基准测试中发现:LLMs的智能水平——由平均基准分数反映——与其压缩外部文本语料的能力几乎呈线性相关。这一结果为"卓越压缩表征更优智能"的信念提供了具体证据。此外,我们的研究表明,作为源自原始文本语料的无监督评估指标,压缩效率是与模型能力呈线性关联的可靠度量标准。我们开源了压缩数据集及数据采集流程,以促进未来研究者对压缩能力的准确评估。