Symbolic logical reasoning is a critical yet underexplored capability of large language models (LLMs), providing reliable and verifiable decision-making in high-stakes domains such as mathematical reasoning and legal judgment. In this study, we present a systematic analysis of logical reasoning under controlled increases in logical complexity, and reveal a previously unrecognized phenomenon, which we term Logical Phase Transitions: rather than degrading smoothly, logical reasoning performance remains stable within a regime but collapses abruptly beyond a critical logical depth, mirroring physical phase transitions such as water freezing beyond a critical temperature threshold. Building on this insight, we propose Neuro-Symbolic Curriculum Tuning, a principled framework that adaptively aligns natural language with logical symbols to establish a shared representation, and reshapes training dynamics around phase-transition boundaries to progressively strengthen reasoning at increasing logical depths. Experiments on five benchmarks show that our approach effectively mitigates logical reasoning collapse at high complexity, yielding average accuracy gains of +1.26 in naive prompting and +3.95 in CoT, while improving generalization to unseen logical compositions. Code and data are available at https://github.com/AI4SS/Logical-Phase-Transitions.
翻译:符号逻辑推理是大语言模型(LLMs)一项关键但尚未被充分探索的能力,它在数学推理和法律判决等高风险领域提供可靠且可验证的决策支持。本研究通过对逻辑复杂性受控增加条件下的逻辑推理进行系统性分析,揭示了一种先前未被认识到的现象,我们称之为“逻辑相变”:逻辑推理性能并非平稳下降,而是在某一区间内保持稳定,但在超过一个临界逻辑深度后会突然崩溃,这类似于水在超过临界温度阈值时结冰等物理相变现象。基于这一洞见,我们提出了“神经符号课程调优”,这是一个原则性框架,它通过自适应地对齐自然语言与逻辑符号以建立共享表示,并围绕相变边界重塑训练动态,从而在递增的逻辑深度上逐步强化推理能力。在五个基准测试上的实验表明,我们的方法能有效缓解高复杂度下的逻辑推理崩溃问题,在朴素提示和思维链提示中分别实现了平均+1.26和+3.95的准确率提升,同时提升了对未见逻辑组合的泛化能力。代码与数据可在 https://github.com/AI4SS/Logical-Phase-Transitions 获取。