Large language models achieve strong performance on many complex reasoning tasks, yet their accuracy degrades sharply on benchmarks that require compositional reasoning, including ARC-AGI-2, GPQA, MATH, BBH, and HLE. Existing methods improve reasoning by expanding token-level search through chain-of-thought prompting, self-consistency, or reinforcement learning, but they leave the model's latent representation space fixed. When the required abstraction is not already encoded in this space, performance collapses. We propose Recursive Concept Evolution (RCE), a framework that enables pretrained language models to modify their internal representation geometry during inference. RCE introduces dynamically generated low-rank concept subspaces that are spawned when representational inadequacy is detected, selected through a minimum description length criterion, merged when synergistic, and consolidated via constrained optimization to preserve stability. This process allows the model to construct new abstractions rather than recombining existing ones. We integrate RCE with Mistral-7B and evaluate it across compositional reasoning benchmarks. RCE yields 12-18 point gains on ARC-AGI-2, 8-14 point improvements on GPQA and BBH, and consistent reductions in depth-induced error on MATH and HLE.
翻译:大语言模型在众多复杂推理任务中展现出强大性能,但其在需要组合推理的基准测试(包括ARC-AGI-2、GPQA、MATH、BBH和HLE)上的准确率显著下降。现有方法通过思维链提示、自洽性或强化学习扩展词元级搜索以改进推理,但这些方法均保持模型的潜在表示空间固定不变。当所需抽象概念未预先编码在该空间中时,模型性能会急剧衰退。本文提出递归概念演化框架,该框架使预训练语言模型能够在推理过程中动态修改其内部表示几何结构。RCE引入动态生成的低秩概念子空间:当检测到表示能力不足时生成新子空间;通过最小描述长度准则进行选择;在产生协同效应时进行融合;并通过约束优化进行固化以保持稳定性。该过程使模型能够构建新的抽象概念,而非仅重组现有概念。我们将RCE与Mistral-7B集成,并在组合推理基准上进行评估。实验表明:RCE在ARC-AGI-2上获得12-18个百分点的提升,在GPQA和BBH上实现8-14个百分点的改进,并在MATH和HLE上持续降低深度诱导误差。