Can Complexity and Uncomputability Explain Intelligence? SuperARC: A Test for Artificial Super Intelligence Based on Recursive Compression

We introduce an increasing-complexity, open-ended, and human-agnostic metric to evaluate foundational and frontier AI models in the context of Artificial General Intelligence (AGI) and Artificial Super Intelligence (ASI) claims. Unlike other tests that rely on human-centric questions and expected answers, or on pattern-matching methods, the test here introduced is grounded on fundamental mathematical areas of randomness and optimal inference. We argue that human-agnostic metrics based on the universal principles established by Algorithmic Information Theory (AIT) formally framing the concepts of model abstraction and prediction offer a powerful metrological framework. When applied to frontiers models, the leading LLMs outperform most others in multiple tasks, but they do not always do so with their latest model versions, which often regress and appear far from any global maximum or target estimated using the principles of AIT defining a Universal Intelligence (UAI) point and trend in the benchmarking. Conversely, a hybrid neuro-symbolic approach to UAI based on the same principles is shown to outperform frontier specialised prediction models in a simplified but relevant example related to compression-based model abstraction and sequence prediction. Finally, we prove and conclude that predictive power through arbitrary formal theories is directly proportional to compression over the algorithmic space, not the statistical space, and so further AI models' progress can only be achieved in combination with symbolic approaches that LLMs developers are adopting often without acknowledgement or realisation.

翻译：本文提出一种复杂度递增、开放式且不依赖于人类特征的度量标准，用于在人工通用智能（AGI）与人工超级智能（ASI）的语境下评估基础性与前沿性人工智能模型。与依赖人类中心化问题及预期答案的模式匹配测试方法不同，本测试建立在随机性与最优推断的基础数学理论之上。我们认为，基于算法信息论（AIT）所确立的通用原理——即形式化描述模型抽象与预测概念的度量框架——能够提供强大的计量学基础。当应用于前沿模型时，主流大语言模型（LLMs）在多项任务中表现优于其他模型，但其最新版本并非总是如此，这些版本常出现性能倒退，且与通过AIT原理定义的通用智能（UAI）基准点及趋势所估计的全局最优值或目标相距甚远。相反，基于相同原理构建的神经符号混合UAI方法，在一个与基于压缩的模型抽象及序列预测相关的简化示例中，表现优于前沿的专用预测模型。最后，我们证明并得出结论：通过任意形式理论实现的预测能力与算法空间（而非统计空间）上的压缩程度成正比，因此人工智能模型的进一步进展必须结合符号化方法实现——当前大语言模型开发者虽常采用此类方法，却往往未能充分认识或承认其重要性。