Towards Real-World Industrial-Scale Verification: LLM-Driven Theorem Proving on seL4

Formal methods (FM) are reliable but costly to apply, often requiring years of expert effort in industrial-scale projects such as seL4, especially for theorem proving. Recent advances in large language models (LLMs) have made automated theorem proving increasingly feasible. However, most prior work focuses on mathematics-oriented benchmarks such as miniF2F, with limited evaluation on real-world verification projects. The few studies that consider industrial-scale verification mostly rely on closed-source models with hundreds of billions of parameters, which cannot be locally deployed and incur substantial usage costs. In this paper, we propose AutoReal, an LLM-driven theorem proving method for real-world industrial-scale systems with support for lightweight local deployment. We evaluate AutoReal on the seL4-Isabelle verification project as a representative and challenging case study. AutoReal incorporates two key improvements: (1) chain-of-thought (CoT)-based proof training, which teaches the LLM the reasoning behind proof steps and enables step-wise explanations alongside proofs, and (2) context augmentation, which leverages proof context from the project to enhance LLM-driven proving. Based on the AutoReal methodology, we fine-tune a base model to obtain AutoReal-Prover, a compact 7B-scale prover for industrial-scale theorem proving. AutoReal-Prover achieves a 51.67% proof success rate on 660 theorems from seL4-designated Important Theories across all 10 seL4 proof categories, substantially outperforming prior attempts on seL4 (27.06%). To evaluate generalization, we further apply AutoReal-Prover to three security-related projects from the Archive of Formal Proofs (AFP), covering all 451 theorems and achieving a proof success rate of 53.88%. Overall, this work advances the application of LLM-driven theorem proving in real-world industrial-scale verification.

翻译：形式化方法虽然可靠，但在工业级项目（如seL4）中的应用成本高昂，尤其在定理证明环节常需耗费专家数年精力。近年来，大语言模型的发展使得自动化定理证明日趋可行。然而，现有研究多集中于miniF2F等数学导向的基准测试，对现实世界验证项目的评估有限。少数涉及工业级验证的研究大多依赖参数量达数千亿的闭源模型，这些模型无法本地部署且使用成本高昂。本文提出AutoReal——一种支持轻量级本地部署、面向现实世界工业级系统的LLM驱动定理证明方法。我们以seL4-Isabelle验证项目作为代表性挑战案例对AutoReal进行评估。AutoReal包含两项关键改进：（1）基于思维链的证明训练，通过向LLM传授证明步骤背后的推理逻辑，使其能够同步生成步骤解释；（2）上下文增强机制，利用项目中的证明上下文提升LLM驱动证明的效果。基于AutoReal方法论，我们通过对基础模型进行微调，得到了专为工业级定理证明设计的紧凑型7B规模证明器AutoReal-Prover。在涵盖seL4全部10个证明类别的660条seL4指定重要理论定理中，AutoReal-Prover实现了51.67%的证明成功率，显著超越此前在seL4上的最佳尝试（27.06%）。为评估泛化能力，我们进一步将AutoReal-Prover应用于形式化证明档案库中的三个安全相关项目，覆盖全部451条定理并取得53.88%的证明成功率。总体而言，本研究推动了LLM驱动定理证明在现实世界工业级验证中的应用进程。