On the Limits of Self-Improving in Large Language Models: The Singularity Is Not Near Without Symbolic Model Synthesis

We formalise recursive self-training in Large Language Models (LLMs) and Generative AI as a discrete-time dynamical system. We prove that if the proportion of exogenous, externally grounded signal $α_t$ vanishes asymptotically ($α_t \to 0$), the system undergoes degenerative dynamics. We derive two fundamental failure modes: (1) \textit{Entropy Decay}, where finite sampling effects induce monotonic loss of distributional diversity, and (2) \textit{Variance Amplification}, where the absence of persistent grounding causes distributional drift via a random-walk mechanism. These behaviours are architectural invariants of distributional learning on finite samples. We show that the collapse results apply specifically to closed-loop density matching without persistent external signal. Systems with non-vanishing exogenous grounding fall outside this regime. However, mainstream Singularity, AGI, and ASI narratives typically posit systems that become increasingly autonomous and require little to no human or external intervention for self-improvement. In that autonomy regime, the vanishing-signal condition is satisfied, and collapse follows under KL-based objectives. To overcome these limits, we propose neurosymbolic integration based on algorithmic probability and program synthesis. The Coding Theorem Method (CTM) enables identification of generative mechanisms rather than mere correlations, escaping the distribution-only constraints that bind standard statistical learning. We conclude that fully autonomous recursive density matching leads to degenerative fixed points, whereas externally anchored or mechanism-based approaches operate under fundamentally different asymptotic dynamics.

翻译：我们将大型语言模型（LLMs）与生成式人工智能中的递归自训练形式化为离散时间动力系统。我们证明，若外部基础信号的比例$α_t$渐近消失（$α_t \to 0$），系统将发生退化动力学。我们推导出两种基本失效模式：（1）**熵衰减**——有限采样效应导致分布多样性的单调损失；（2）**方差放大**——缺乏持续基础信号通过随机游走机制引发分布漂移。这些行为是有限样本分布学习的架构不变量。我们证明，该崩溃结果特别适用于无持续外部信号的闭环密度匹配。具有非消失外部基础信号的系统不在此范畴内。然而，主流奇点、通用人工智能（AGI）与超级人工智能（ASI）论述通常假设系统自主性不断增强，其自我改进几乎无需人类或外部干预。在此自主性机制下，信号消失条件得以满足，基于KL散度的目标函数将导致系统崩溃。为突破这些限制，我们提出基于算法概率与程序合成的神经符号集成方法。编码定理方法（CTM）能够识别生成机制而非仅相关关系，从而摆脱束缚传统统计学习的纯分布约束。我们的结论是：完全自主的递归密度匹配将导向退化不动点，而外部锚定或基于机制的方法则遵循根本不同的渐近动力学。