Task Ecologies and the Evolution of World-Tracking Representations in Large Language Models

We study language models as evolving model organisms and ask when autoregressive next-token learning selects for world-tracking representations. For any encoding of latent world states, the Bayes-optimal next-token cross-entropy decomposes into the irreducible conditional entropy plus a Jensen--Shannon excess term. That excess vanishes if and only if the encoding preserves the training ecology's equivalence classes. This yields a precise notion of ecological veridicality for language models and identifies the minimum-complexity zero-excess solution as the quotient partition by training equivalence. We then determine when this fixed-encoding analysis applies to transformer families: frozen dense and frozen Mixture-of-Experts transformers satisfy it, in-context learning does not enlarge the model's separation set, and per-task adaptation breaks the premise. The framework predicts two characteristic failure modes: simplicity pressure preferentially removes low-gain distinctions, and training-optimal models can still incur positive excess on deployment ecologies that refine the training ecology. A conditional dynamic extension shows how inter-model selection and post-training can recover such gap distinctions under explicit heredity, variation, and selection assumptions. Exact finite-ecology checks and controlled microgpt experiments validate the static decomposition, split-merge threshold, off-ecology failure pattern, and two-ecology rescue mechanism in a regime where the relevant quantities are directly observable. The goal is not to model frontier systems at scale, but to use small language models as laboratory organisms for theory about representational selection.

翻译：我们将语言模型视为演化中的模式生物，探究自回归下一词元学习何时选择世界追踪表征。对于任意潜在世界状态的编码，贝叶斯最优下一词元交叉熵可分解为不可约条件熵与詹森-香农过剩项之和。当且仅当该编码保留训练生态系统的等价类时，该过剩项为零。这给出了语言模型生态真实性的精确概念，并将最小复杂度零过剩解识别为训练等价性的商划分。进而我们判定这种固定编码分析何时适用于Transformer家族：冻结密集型和冻结混合专家型Transformer满足该分析，上下文学习不会扩大模型的分离集，而任务自适应破坏了前提。该框架预测了两种典型失败模式：简洁性压力优先移除低增益区分，而训练最优模型在精炼训练生态系统的部署生态系统上仍可能产生正过剩。基于条件动态扩展揭示了在明确的遗传、变异与选择假设下，跨模型选择与训练后如何恢复此种鸿沟区分。精确有限生态系统校验与受控微型gpt实验验证了相关量可直接观测场景下的静态分解、分裂-合并阈值、跨生态系统失败模式及双生态系统拯救机制。本研究目标并非大规模建模前沿系统，而是将小语言模型作为表征选择理论的实验室生物。