Mathematical reasoning, a core aspect of human cognition, is vital across many domains, from educational problem-solving to scientific advancements. As artificial general intelligence (AGI) progresses, integrating large language models (LLMs) with mathematical reasoning tasks is becoming increasingly significant. This survey provides the first comprehensive analysis of mathematical reasoning in the era of multimodal large language models (MLLMs). We review over 200 studies published since 2021, and examine the state-of-the-art developments in Math-LLMs, with a focus on multimodal settings. We categorize the field into three dimensions: benchmarks, methodologies, and challenges. In particular, we explore multimodal mathematical reasoning pipeline, as well as the role of (M)LLMs and the associated methodologies. Finally, we identify five major challenges hindering the realization of AGI in this domain, offering insights into the future direction for enhancing multimodal reasoning capabilities. This survey serves as a critical resource for the research community in advancing the capabilities of LLMs to tackle complex multimodal reasoning tasks.
翻译:数学推理作为人类认知的核心能力,在教育解题与科学进步等诸多领域具有至关重要的意义。随着通用人工智能的发展,将大语言模型与数学推理任务相结合正变得日益重要。本综述首次对多模态大语言模型时代的数学推理研究进行了全面分析。我们回顾了2021年以来发表的200余项研究,系统梳理了数学大语言模型的最新进展,并着重关注多模态场景。我们将该领域划分为三个维度:基准数据集、方法论与核心挑战。特别地,我们深入探讨了多模态数学推理的技术框架,以及(多模态)大语言模型在其中发挥的作用与相关方法学。最后,我们指出了阻碍该领域实现通用人工智能的五大挑战,并为增强多模态推理能力提供了未来研究方向。本综述将为研究社区提升大语言模型处理复杂多模态推理任务的能力提供重要参考。