Evolution and compression in LLMs: On the emergence of human-aligned categorization

from arxiv, Published as a conference paper at ICLR 2026 (The Fourteenth International Conference on Learning Representations). OpenReview: https://openreview.net/forum?id=s7gSTR2AqA&noteId=s7gSTR2AqA

Converging evidence suggests that human systems of semantic categories achieve near-optimal compression via the Information Bottleneck (IB) complexity-accuracy tradeoff. Large language models (LLMs) are not trained for this objective, which raises the question: are LLMs capable of evolving efficient human-aligned semantic systems? To address this question, we focus on color categorization -- a key testbed of cognitive theories of categorization with uniquely rich human data -- and replicate with LLMs two influential human studies. First, we conduct an English color-naming study, showing that LLMs vary widely in their complexity and English-alignment, with larger instruction-tuned models achieving better alignment and IB-efficiency. Second, to test whether these LLMs simply mimic patterns in their training data or actually exhibit a human-like inductive bias toward IB-efficiency, we simulate cultural evolution of pseudo color-naming systems in LLMs via a method we refer to as Iterated in-Context Language Learning (IICLL). We find that akin to humans, LLMs iteratively restructure initially random systems towards greater IB-efficiency. However, only a model with strongest in-context capabilities (Gemini 2.0) is able to recapitulate the wide range of near-optimal IB-tradeoffs observed in humans, while other state-of-the-art models converge to low-complexity solutions. These findings demonstrate how human-aligned semantic categories can emerge in LLMs via the same fundamental principle that underlies semantic efficiency in humans.

翻译：越来越多的证据表明，人类的语义分类系统通过信息瓶颈（IB）的复杂度-准确性权衡实现了近乎最优的压缩。大语言模型（LLMs）并非为此目标而训练，这引出了一个问题：LLMs能否演化出高效的人类对齐语义系统？为解答此问题，我们聚焦于颜色分类——认知分类理论中具有独特丰富人类数据的关键测试平台——并使用LLMs复现了两项具有影响力的人类研究。首先，我们进行了一项英语颜色命名研究，结果表明LLMs在其复杂度和英语对齐性上差异巨大，其中更大的指令调优模型实现了更好的对齐性和IB效率。其次，为了测试这些LLMs是仅仅模仿其训练数据中的模式，还是真正展现出一种类似人类的、趋向IB效率的归纳偏置，我们通过一种称为"迭代上下文语言学习"（IICLL）的方法，在LLMs中模拟了伪颜色命名系统的文化演化。我们发现，与人类相似，LLMs会将初始随机系统迭代重组，以实现更高的IB效率。然而，只有具备最强上下文能力的模型（Gemini 2.0）能够重现人类中观察到的广泛近乎最优的IB权衡，而其他最先进的模型则收敛于低复杂度解。这些发现揭示了人类对齐的语义类别如何能够通过构成人类语义效率基础的同一基本原则，在LLMs中涌现。