TLA-Prover: Verifiable TLA+ Specification Synthesis via Preference-Optimized Low-Rank Adaptation

TLA+ is a formal specification language for verifying distributed systems and safety-critical protocols. Large language models (LLMs) frequently produce TLA+ specifications that fail the TLC model checker for semantic reasons. Across 25 LLMs, the best public baseline is 26.6% syntactic parse and 8.6% semantic model-check. We present TLA-Prover, a 20-billion-parameter model for TLA+ specification synthesis. Training combines supervised fine-tuning (SFT) on verified examples with repair-based group-relative policy optimization (GRPO). In the GRPO stage, the model learns to fix its own rejected specifications. We also train a direct preference optimization (DPO) variant from the same SFT checkpoint as an ablation. TLC provides the reward signal directly, with no learned reward model. Four tiers grade each output: Bronze (parses), Silver (no warnings), Gold (passes TLC), and Diamond. To reach Diamond, the model's correctness property is automatically altered in a small way; TLC must then detect a violation. If TLC still passes, the property was always-true and contributes nothing; the output fails Diamond. TLA-Prover reaches 9/30 (i.e. pass@1 = 30%) at both Gold and Diamond on a held-out 30-problem benchmark. This is roughly 3.5x the 8.6% untuned baseline. The DPO variant reaches 20% at Diamond. Gold and Diamond coincide at every checkpoint; this prevents the trivial-property failure mode.

翻译：TLA+是一种用于验证分布式系统与安全关键协议的正式规约语言。大型语言模型生成的TLA+规约常因语义原因无法通过TLC模型检查器。在25个LLM中，最佳公开基线的语法解析成功率为26.6%，语义模型检查成功率为8.6%。我们提出TLA-Prover——一个200亿参数的TLA+规约综合模型。训练过程结合了基于已验证示例的监督微调与基于修复的组相对策略优化。在GRPO阶段，模型学习修正自身被拒绝的规约。我们还从同一SFT检查点训练了直接偏好优化变体作为消融实验。TLC直接提供奖励信号，无需学习奖励模型。每个输出按四个等级分级：青铜（通过解析）、白银（无警告）、黄金（通过TLC）和钻石。要达到钻石等级，模型的正确性属性需被自动小幅度修改；TLC必须检测到违规行为。若TLC仍通过，该属性为恒真且无贡献，输出不能达到钻石等级。在保留的30问题基准测试中，TLA-Prover在黄金和钻石等级均达到9/30（即pass@1=30%），约为未调优基线8.6%的3.5倍。DPO变体在钻石等级达到20%。所有检查点的黄金与钻石等级一致，这防止了平凡属性失败模式。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

LaCache：用于高效长上下文建模的大语言模型梯状KV缓存机制

专知会员服务

11+阅读 · 2025年7月23日

【WWW2025】G-Refer：基于图检索增强的大型语言模型用于可解释推荐

专知会员服务

13+阅读 · 2025年4月8日

《直接偏好优化研究综述》

专知会员服务

31+阅读 · 2025年3月18日

【ICLR2025】大型语言模型的动态低秩稀疏适应

专知会员服务

14+阅读 · 2025年2月21日