Geometric problem solving (GPS) requires precise multimodal understanding and rigorous, step-by-step logical reasoning. However, developing capable Multimodal Large Language Models (MLLMs) for GPS is heavily bottlenecked by the scarcity of high-quality, verifiable data. Existing data acquisition paradigms either suffer from modality incompleteness and unverified logical gaps ("leaps-of-faith"), or rely on formal engines that generate rigid, structurally homogeneous data, failing to produce high-difficulty problems or foster genuine natural-language reasoning. To overcome these limitations, we introduce TrustGeoGen, an autonomous and formalized geometric data generation engine. TrustGeoGen strictly guarantees reasoning trustworthiness through formal verification while generating multimodally integrated data, including premises, visual diagrams, and solutions. To systematically scale problem difficulty, we incorporates difficulty-aware filtering and iterative bootstrapping mechanism. Furthermore, we propose "connection thinking" to bridge the semantic gap between rigid formal logic and fluent human-like reasoning, ensuring coherent logical transitions. We also introduce the GeoExplore family of sampling algorithms to extract diverse problem-solving trajectories based on various thinking templates. Extensive experiments demonstrate that training models on our synthesized dataset, GeoTrust, substantially enhances deep geometric reasoning capabilities and yields significant performance gains across out-of-distribution (OOD) benchmarks, including GeoQA, Geometry3K, and OlympiadBench.Our code and data can be found at https://github.com/InternScience/TrustGeoGen
翻译:几何问题求解(GPS)需要精确的多模态理解与严谨、步进式的逻辑推理。然而,开发适用于GPS的多模态大语言模型(MLLM)严重受限于高质量、可验证数据的匮乏。现有数据获取范式要么存在模态不完整及未经验证的逻辑跳跃("信仰之跃"),要么依赖形式化引擎生成结构僵化、单一的数据,既难以产生高难度问题,也无法促进自然的语言推理。为解决上述局限,我们提出TrustGeoGen——一种自主化、形式化的几何数据生成引擎。该引擎在生成包含前提、视觉图示与解决方案的多模态整合数据的同时,通过形式化验证严格保障推理可信性。为系统化扩展问题难度,我们引入难度感知过滤与迭代自举机制。进一步,我们提出"连接思维"以弥合刚性形式逻辑与流畅类人推理之间的语义鸿沟,确保连贯的逻辑过渡。同时,我们提出GeoExplore系列采样算法,基于多种思维模板提取多样化的解题轨迹。大量实验表明,在我们合成的数据集GeoTrust上训练的模型,显著增强了深度几何推理能力,并在包括GeoQA、Geometry3K和OlympiadBench在内的跨分布(OOD)基准测试中取得显著性能提升。我们的代码与数据可通过https://github.com/InternScience/TrustGeoGen获取。