Although most of the automated theorem-proving approaches depend on formal proof systems, informal theorem proving can align better with large language models' (LLMs) strength in natural language processing. In this work, we identify a primary bottleneck in informal theorem proving as a lack of insight, namely the difficulty of recognizing the core techniques required to solve complex problems. To address this, we propose $\texttt{DeepInsight}$, a unified training framework designed to cultivate this essential reasoning skill and enable LLMs to perform insightful reasoning. Our framework consists of three components: (1) $\texttt{DeepInsightTheorem}$, a hierarchical dataset that structures informal proofs by explicitly extracting core techniques and proof sketches alongside the final proof; (2) a Progressive Multi-Stage SFT strategy that mimics the human learning process, teaching the model proof writing, planning, and insight identification; and (3) $\texttt{InsightPO}$, a policy optimization method that assigns structured rewards over this insight hierarchy. Our experiments on challenging mathematical benchmarks demonstrate that this insight-aware generation strategy significantly outperforms baselines. These results demonstrate that teaching models to identify and apply core techniques can substantially improve their mathematical reasoning.
翻译:尽管大多数自动定理证明方法依赖于形式化证明系统,但非形式定理证明能更好地契合大语言模型在自然语言处理中的优势。本工作发现非形式定理证明的主要瓶颈在于缺乏洞察力,即难以识别解决复杂问题所需的核心技术。为此,我们提出$\texttt{DeepInsight}$,一个统一的训练框架,旨在培养这种关键的推理能力,使大语言模型能够进行基于洞察的推理。该框架包含三个组件:(1)$\texttt{DeepInsightTheorem}$,一个层次化数据集,通过显式提取核心技术、证明草图及最终证明来结构化非形式证明;(2)渐进式多阶段监督微调策略,模拟人类学习过程,教模型掌握证明撰写、规划与洞察识别;(3)$\texttt{InsightPO}$,一种策略优化方法,在该洞察层次结构上分配结构化奖励。我们在具有挑战性的数学基准上的实验表明,这种感知洞察的生成策略显著优于基线方法。结果表明,教模型识别并应用核心技术能大幅提升其数学推理能力。