We present PutnamBench, a new multi-language benchmark for evaluating the ability of neural theorem-provers to solve competition mathematics problems. PutnamBench consists of 1692 hand-constructed formalizations of 640 theorems sourced from the William Lowell Putnam Mathematical Competition, the premier undergraduate-level mathematics competition in North America. All the problems have formalizations in Lean 4 and Isabelle; a substantial subset also has Coq formalizations. PutnamBench requires significant problem-solving ability and proficiency in a broad range of topics taught in undergraduate mathematics courses. We use PutnamBench to evaluate several established neural and symbolic theorem-provers. These approaches can only solve a handful of the PutnamBench problems, establishing the benchmark as a difficult open challenge for research on neural theorem-proving. PutnamBench is available at https://github.com/trishullab/PutnamBench.
翻译:本文提出PutnamBench,一个用于评估神经定理证明器解决竞赛数学问题能力的多语言新基准。该基准包含1692个手工构建的形式化问题,源自北美顶级本科数学竞赛——威廉·洛厄尔·普特南数学竞赛的640个定理。所有问题均提供Lean 4与Isabelle的形式化版本,其中大部分还包含Coq形式化实现。PutnamBench要求具备显著的解题能力及对本科数学课程广泛知识领域的熟练掌握。我们使用该基准评估了多个成熟的神经与符号定理证明器。现有方法仅能解决少量PutnamBench问题,表明该基准构成了神经定理证明研究领域一个具有挑战性的开放难题。PutnamBench可通过https://github.com/trishullab/PutnamBench获取。