PhysProver: Advancing Automatic Theorem Proving for Physics

The combination of verifiable languages and LLMs has significantly influenced both the mathematical and computer science communities because it provides a rigorous foundation for theorem proving. Recent advancements in the field provide foundation models and sophisticated agentic systems pushing the boundaries of formal mathematical reasoning to approach the natural language capability of LLMs. However, little attention has been given to the formal physics reasoning, which also heavily relies on similar problem-solving and theorem-proving frameworks. To solve this problem, this paper presents, to the best of our knowledge, the first approach to enhance formal theorem proving in the physics domain. We compose a dedicated dataset PhysLeanData for the task. It is composed of theorems sampled from PhysLean and data generated by a conjecture-based formal data generation pipeline. In the training pipeline, we leverage DeepSeek-Prover-V2-7B, a strong open-source mathematical theorem prover, and apply Reinforcement Learning with Verifiable Rewards (RLVR) to train our model PhysProver. Comprehensive experiments demonstrate that, using only $\sim$5K training samples, PhysProver achieves an overall 2.4\% improvement in multiple sub-domains. Furthermore, after formal physics training, we observe 1.3\% gains on the MiniF2F-Test benchmark, which indicates non-trivial generalization beyond physics domains and enhancement for formal math capability as well. The results highlight the effectiveness and efficiency of our approach, which provides a paradigm for extending formal provers outside mathematical domains. To foster further research, we will release both our dataset and model to the community.

翻译：可验证语言与大型语言模型（LLM）的结合显著影响了数学和计算机科学领域，因为它为定理证明提供了严谨的基础。该领域的最新进展提供了基础模型和复杂的智能体系统，将形式化数学推理的边界推向接近LLM的自然语言处理能力。然而，形式化物理推理尚未得到充分关注，而该领域同样严重依赖于类似的问题解决和定理证明框架。为解决这一问题，本文提出了据我们所知首个增强物理学领域形式化定理证明的方法。我们为此任务构建了专用数据集PhysLeanData，该数据集由从PhysLean采样的定理以及基于猜想的形式化数据生成流程生成的数据组成。在训练流程中，我们利用强大的开源数学定理证明器DeepSeek-Prover-V2-7B，并应用带可验证奖励的强化学习（RLVR）训练我们的模型PhysProver。综合实验表明，仅使用约5K训练样本，PhysProver在多个子领域实现了整体2.4%的性能提升。此外，经过形式化物理训练后，我们在MiniF2F-Test基准测试中观察到1.3%的性能增益，这表明模型不仅实现了跨物理学领域的非平凡泛化，同时增强了形式化数学能力。这些结果凸显了我们方法的有效性和高效性，为将形式化证明器扩展至数学领域之外提供了范式。为促进进一步研究，我们将向社区公开数据集和模型。