Recent advances in Automated Theorem Proving have shown the effectiveness of leveraging a (large) language model that generates tactics (i.e. proof steps) to search through proof states. The current model, while trained solely on successful proof paths, faces a discrepancy at the inference stage, as it must sample and try various tactics at each proof state until finding success, unlike its training which does not incorporate learning from failed attempts. Intuitively, a tactic that leads to a failed search path would indicate that similar tactics should receive less attention during the following trials. In this paper, we demonstrate the benefit of training models that additionally learn from failed search paths. Facing the lack of such trial-and-error data in existing open-source theorem-proving datasets, we curate a dataset on intuitionistic propositional logic theorems and formalize it in Lean, such that we can reliably check the correctness of proofs. We compare our model trained on relatively short trial-and-error information (TrialMaster) with models trained only on the correct paths and discover that the former solves more unseen theorems with lower trial searches.
翻译:自动定理证明领域的最新进展表明,利用(大)语言模型生成策略(即证明步骤)以遍历证明状态是有效的。现有模型仅基于成功的证明路径进行训练,但在推理阶段面临不一致性:它必须在每个证明状态采样并尝试多种策略直至成功,而训练过程并未纳入从失败尝试中学习。直观上,导致搜索路径失败的策略意味着在后续尝试中应减少对类似策略的关注。本文证明了通过额外学习失败搜索路径训练模型的优势。面对现有开源定理证明数据集中缺乏此类试错数据的问题,我们构建了关于直觉主义命题逻辑定理的数据集,并用Lean形式化,从而可靠地验证证明的正确性。我们将基于相对简短试错信息训练的模型(TrialMaster)与仅基于正确路径训练的模型进行比较,发现前者能以更少的尝试搜索解决更多未见定理。