The Information-Theoretic Imperative: Compression and the Epistemic Foundations of Intelligence

Why do brains and deep networks converge on similar representations? Task-optimized artificial neural networks quantitatively predict primate ventral stream responses despite radically different substrates and optimization dynamics. This convergence demands explanation beyond shared natural image statistics or task structure alone. The Compression Efficiency Principle (CEP) specifies the selection mechanism: representations exploiting unstable correlations pay a growing "exception tax" (approximately linear excess codelength under shortcut-flipping shifts), while representations encoding shift-stable invariants amortize this cost. When environments provide intervention-rich shifts and exhibit approximately modular causal structure, these invariants align with causal mechanisms. The framework offers a unified lens on three biological signatures -- steep metabolic constraints on neural signaling, high coding efficiency in early sensory pathways, and hierarchical tolerance in the ventral stream -- and connects them to parallel phenomena in deep learning: scaling frontiers, shortcut failures under distribution shift, and the role of augmentation in enforcing invariances. Distinctive predictions follow: a crossover threshold beyond which invariant representations dominate, and systematic coupling between compression efficiency and out-of-distribution robustness -- testable across substrates. Predicted divergences (sparse biological signaling versus dense overparameterization) arise from different resource constraints on a shared trade-off topology. The convergence is not a coincidence. It is evidence for a substrate-independent basin shaped by predictive compression under shift.

翻译：为何大脑与深度网络会收敛于相似的表征？尽管具有截然不同的基质与优化动力学，任务优化的人工神经网络仍能定量预测灵长类腹侧视觉通路的响应。这种收敛性需要超越共享自然图像统计或任务结构的解释。压缩效率原则（CEP）明确了选择机制：利用不稳定相关性的表征需支付递增的“例外税”（在捷径翻转偏移下近似线性的超额编码长度），而编码偏移稳定不变量的表征则可分摊此成本。当环境提供干预丰富的偏移并呈现近似模块化的因果结构时，这些不变量将与因果机制对齐。该框架为三个生物学特征——神经信号传递的陡峭代谢约束、早期感觉通路的高编码效率、腹侧通路的层次化容忍度——提供了统一视角，并将其与深度学习中的平行现象相连接：扩展前沿、分布偏移下的捷径失效、以及数据增强在强制不变性中的作用。由此可推导出独特预测：存在一个超越后不变量表征占主导的交叉阈值，以及压缩效率与分布外鲁棒性之间的系统性耦合——该预测可在不同基质中进行检验。预测的分歧（稀疏生物信号传递与密集过参数化）源于共享权衡拓扑结构上不同的资源约束。这种收敛并非偶然，它是在偏移条件下由预测性压缩塑造的、基质无关的收敛域存在的证据。