Proving geometric theorems constitutes a hallmark of visual reasoning combining both intuitive and logical skills. Therefore, automated theorem proving of Olympiad-level geometry problems is considered a notable milestone in human-level automated reasoning. The introduction of AlphaGeometry, a neuro-symbolic model trained with 100 million synthetic samples, marked a major breakthrough. It solved 25 of 30 International Mathematical Olympiad (IMO) problems whereas the reported baseline based on Wu's method solved only ten. In this note, we revisit the IMO-AG-30 Challenge introduced with AlphaGeometry, and find that Wu's method is surprisingly strong. Wu's method alone can solve 15 problems, and some of them are not solved by any of the other methods. This leads to two key findings: (i) Combining Wu's method with the classic synthetic methods of deductive databases and angle, ratio, and distance chasing solves 21 out of 30 methods by just using a CPU-only laptop with a time limit of 5 minutes per problem. Essentially, this classic method solves just 4 problems less than AlphaGeometry and establishes the first fully symbolic baseline strong enough to rival the performance of an IMO silver medalist. (ii) Wu's method even solves 2 of the 5 problems that AlphaGeometry failed to solve. Thus, by combining AlphaGeometry with Wu's method we set a new state-of-the-art for automated theorem proving on IMO-AG-30, solving 27 out of 30 problems, the first AI method which outperforms an IMO gold medalist.
翻译:几何定理证明是结合直觉与逻辑技能的视觉推理典范,因此奥林匹克级别几何问题的自动定理证明被视为人类级自动推理的重要里程碑。基于1亿个合成样本训练的神经符号模型AlphaGeometry的引入标志着重大突破——它解决了30道国际数学奥林匹克(IMO)几何问题中的25道,而基于吴方法的基线仅解出10道。本研究重新审视AlphaGeometry提出的IMO-AG-30挑战,发现吴方法具有惊人实力:单独使用可解15题,且其中部分题目为其他方法所未解。由此产生两个关键发现:(i)将吴方法与经典综合方法(演绎数据库及角、比、距离追踪)结合,仅使用CPU笔记本电脑、每道题5分钟时限即可解出21/30题——这比AlphaGeometry仅少解4题,建立了首个足以匹敌IMO银牌得主性能的全符号基线;(ii)吴方法甚至解出AlphaGeometry未解5题中的2题。因此,将AlphaGeometry与吴方法结合后,我们在IMO-AG-30基准上确立了自动定理证明的新最优水平(解出27/30题),这是首个超越IMO金牌得主的AI方法。