Geometry problem solving (GPS) is a high-level mathematical reasoning requiring the capacities of multi-modal fusion and geometric knowledge application. Recently, neural solvers have shown great potential in GPS but still be short in diagram presentation and modal fusion. In this work, we convert diagrams into basic textual clauses to describe diagram features effectively, and propose a new neural solver called PGPSNet to fuse multi-modal information efficiently. Combining structural and semantic pre-training, data augmentation and self-limited decoding, PGPSNet is endowed with rich knowledge of geometry theorems and geometric representation, and therefore promotes geometric understanding and reasoning. In addition, to facilitate the research of GPS, we build a new large-scale and fine-annotated GPS dataset named PGPS9K, labeled with both fine-grained diagram annotation and interpretable solution program. Experiments on PGPS9K and an existing dataset Geometry3K validate the superiority of our method over the state-of-the-art neural solvers. The code and dataset will be public available soon.
翻译:几何问题求解(GPS)是一项需要多模态融合与几何知识应用能力的高级数学推理任务。近年来,神经求解器在GPS领域展现出巨大潜力,但在图解表示与模态融合方面仍存在不足。本研究将几何图解转化为基础文本子句以有效描述图解特征,并提出一种名为PGPSNet的新型神经求解器,用于高效融合多模态信息。通过结合结构与语义预训练、数据增强及自限制解码策略,PGPSNet具备了丰富的几何定理知识与几何表征能力,从而提升几何理解与推理性能。此外,为促进GPS研究,我们构建了名为PGPS9K的大规模精细标注GPS数据集,包含细粒度图解标注与可解释的求解程序。在PGPS9K与现有数据集Geometry3K上的实验验证了本方法相较于最先进神经求解器的优越性。代码与数据集将尽快开源。