We report on an experiment in which Claude Opus~4.6, equipped with a suite of Model Context Protocol (MCP) tools for the Rocq proof assistant, autonomously proved 10 of 12 problems from the 2025 Putnam Mathematical Competition. The MCP tools, designed with Claude by analyzing logs from a prior experiment on miniF2F-Rocq, encode a "compile-first, interactive-fallback" strategy. Running on an isolated VM with no internet access, the agent deployed 141 subagents over 17.7 hours of active compute (51.6h wall-clock), consuming approximately 1.9 billion tokens. All proofs are publicly available.
翻译:本文报道了一项实验:配备一套面向Rocq证明助手的模型上下文协议(MCP)工具的Claude Opus 4.6,自主证明了2025年普特南数学竞赛(Putnam Mathematical Competition)12道问题中的10道。这些MCP工具由Claude通过分析此前miniF2F-Rocq实验日志设计而成,编码了一种“先编译、后交互式回退”(compile-first, interactive-fallback)策略。该智能体在无互联网接入的隔离虚拟机上运行,于17.7小时有效计算时间(51.6小时挂钟时间)内部署了141个子智能体,消耗约19亿个令牌。所有证明均已公开。