Autoformalization, which translates natural language mathematics into formal statements to enable machine reasoning, faces fundamental challenges in the wild due to the multimodal nature of the physical world, where physics requires inferring hidden constraints (e.g., mass or energy) from visual elements. To address this, we propose MMFormalizer, which extends autoformalization beyond text by integrating adaptive grounding with entities from real-world mathematical and physical domains. MMFormalizer recursively constructs formal propositions from perceptually grounded primitives through recursive grounding and axiom composition, with adaptive recursive termination ensuring that every abstraction is supported by visual evidence and anchored in dimensional or axiomatic grounding. We evaluate MMFormalizer on a new benchmark, PhyX-AF, comprising 115 curated samples from MathVerse, PhyX, Synthetic Geometry, and Analytic Geometry, covering diverse multimodal autoformalization tasks. Results show that frontier models such as GPT-5 and Gemini-3-Pro achieve the highest compile and semantic accuracy, with GPT-5 excelling in physical reasoning, while geometry remains the most challenging domain. Overall, MMFormalizer provides a scalable framework for unified multimodal autoformalization, bridging perception and formal reasoning. To the best of our knowledge, this is the first multimodal autoformalization method capable of handling classical mechanics (derived from the Hamiltonian), as well as relativity, quantum mechanics, and thermodynamics. More details are available on our project page: MMFormalizer.github.io
翻译:自动形式化旨在将自然语言数学描述转化为形式化陈述以支持机器推理,但在开放物理世界中面临根本性挑战,因为物理问题常需从视觉元素推断隐含约束(如质量或能量)。为此,我们提出MMFormalizer,通过整合现实世界数学与物理领域的自适应实体关联,将自动形式化从纯文本扩展到多模态场景。该系统通过递归关联与公理组合,从感知关联的基元递归构建形式化命题,其自适应递归终止机制确保每个抽象概念均获得视觉证据支持,并锚定于维度或公理基础。我们在新基准测试集PhyX-AF上评估MMFormalizer,该数据集包含从MathVerse、PhyX、合成几何与解析几何中精选的115个样本,涵盖多样化的多模态自动形式化任务。实验结果表明,GPT-5和Gemini-3-Pro等前沿模型在编译与语义准确性上表现最佳,其中GPT-5在物理推理方面尤为突出,而几何领域仍是当前最具挑战性的方向。总体而言,MMFormalizer为统一的多模态自动形式化提供了可扩展框架,架起了感知与形式化推理之间的桥梁。据我们所知,这是首个能够处理经典力学(源自哈密顿量)以及相对论、量子力学和热力学的多模态自动形式化方法。更多细节请访问项目页面:MMFormalizer.github.io