The large language models (LLMs) might produce a persuasive argument within mathematical and logical fields, although such argument often includes some minor missteps, including the entire omission of side conditions, invalid inference patterns, or appeals to a lemma that cannot be derived logically out of the context being discussed. These omissions are infamously hard to notice solely out of the text, as even the misconstrued construction still may seem mostly accurate. Conversely, interactive theorem provers like Lean and Coq have rigorous reliability by ensuring that syntactic and semantic statements only accept statements that can pass all the syntactic and semantic steps in the program which is a small trusted kernel of the language type-checks with. Despite the fact that this technique provides strong guarantees, it comes at quite a heavy price: the evidence must be completely formalized, and the evidence user or a auxiliary search program must provide an avalanche of low-level information. This paper presents a hybrid pipeline where an LLM generates a typed proof sketch in a compact DSL and a lightweight trusted kernel expands the sketch into explicit proof obligations.
翻译:大语言模型(LLMs)在数学和逻辑领域中可能生成颇具说服力的论证,尽管这类论证常包含一些微小失误,例如完全遗漏辅助条件、使用无效的推理模式,或引用一个在讨论语境中无法逻辑推导出的引理。这些疏漏仅凭文本极难察觉,因为即使是有缺陷的构造,整体上仍可能看似准确。相反,像Lean和Coq这样的交互式定理证明器通过确保语法和语义语句仅接受能通过程序中所有语法和语义步骤的声明(即语言类型检查的小型可信内核)来提供严格的可靠性。尽管这项技术提供了强有力的保证,但代价相当高昂:证据必须完全形式化,且证据使用者或辅助搜索程序必须提供海量的底层信息。本文提出了一种混合流水线,其中大语言模型以紧凑的领域特定语言(DSL)生成带类型的证明草图,而轻量级可信内核则将该草图展开为明确的证明义务。