We present Lean-GAP (Lean-Graduate Agebra Problems), 430 formalized graduate-level algebra problems from the textbook Abstract Algebra by Dummit and Foote. We develop a scalable pipeline consisting of PDF-to-LaTeX preprocessing, autoformalization into Lean 4, and verification of informal-formal correspondence. While the preprocessing and autoformalization stages can be largely automated, we find that verification remains the most subtle and labor-intensive component, requiring careful human oversight. Our contributions include (i) the construction of a structured dataset of formalized exercises, (ii) a systematic methodology for formalizing textbook mathematics, and (iii) an analysis of recurring challenges in the formalization process. We also compare the performance of different autoformalization models and highlight key bottlenecks in translating informal statements into formal language.
翻译:我们提出Lean-GAP(Lean研究生代数问题),包含来自Dummit与Foote教材《抽象代数》中的430道形式化研究生水平代数问题。我们开发了一个可扩展的流程,包括PDF到LaTeX的预处理、自动形式化为Lean 4语言,以及非形式化-形式化对应关系的验证。尽管预处理和自动形式化阶段可以大幅自动化,但我们发现验证仍然是最微妙且劳动密集的环节,需要仔细的人工监督。我们的贡献包括:(i) 构建了一个结构化的形式化习题数据集,(ii) 为教材数学形式化提供了一种系统方法论,(iii) 分析了形式化过程中反复出现的挑战。我们还比较了不同自动形式化模型的性能,并突出了将非形式化陈述转化为形式语言时的关键瓶颈。