Large language models (LLMs) have been used to generate formal proofs of mathematical theorems in proofs assistants such as Lean. However, we often want to optimize a formal proof with respect to various criteria, depending on its downstream use. For example, we may want a proof to adhere to a certain style, or to be readable, concise, or modularly structured. Having suitably optimized proofs is also important for learning tasks, especially since human-written proofs may not optimal for that purpose. To this end, we study a new problem of automated proof optimization: rewriting a proof so that it is correct and optimizes for an arbitrary criterion, such as length or readability. As a first method for automated proof optimization, we present ImProver, a large-language-model agent that rewrites proofs to optimize arbitrary user-defined metrics in Lean. We find that naively applying LLMs to proof optimization falls short, and we incorporate various improvements into ImProver, such as the use of symbolic Lean context in a novel Chain-of-States technique, as well as error-correction and retrieval. We test ImProver on rewriting real-world undergraduate, competition, and research-level mathematics theorems, finding that ImProver is capable of rewriting proofs so that they are substantially shorter, more modular, and more readable.
翻译:大型语言模型已被用于在Lean等证明助手中生成数学定理的形式化证明。然而,我们往往需要根据下游应用场景,依据不同标准对形式化证明进行优化。例如,我们可能希望证明符合特定风格、易于阅读、简洁或具有模块化结构。对学习任务而言,拥有经过适当优化的证明同样至关重要,尤其因为人工撰写的证明可能并非针对该目的的最优方案。为此,我们研究了一个新的自动化证明优化问题:重写证明,使其在满足正确性的同时,针对任意标准(如长度或可读性)进行优化。作为自动化证明优化的首个方法,我们提出了ImProver——一种基于大型语言模型的智能体,可在Lean中重写证明以优化用户定义的任意指标。我们发现,直接应用大型语言模型进行证明优化效果欠佳,因此我们为ImProver引入了多项改进,包括在新型链式状态技术中利用符号化Lean上下文,以及错误修正与检索机制。我们通过重写涉及本科、竞赛及研究级别的真实数学定理对ImProver进行测试,结果表明该模型能够显著缩短证明长度、提升模块化程度与可读性。