A critical question about Large Language Models (LLMs) is whether their apparent deficiency in mathematical reasoning is inherent, or merely a result of insufficient exposure to high-quality mathematical data. To explore this, we developed an automated method for generating high-quality, supervised mathematical datasets. The method carefully mutates existing math problems, ensuring both diversity and validity of the newly generated problems. This is achieved by a neuro-symbolic data generation framework combining the intuitive informalization strengths of LLMs, and the precise symbolic reasoning of math solvers along with projected Markov chain Monte Carlo sampling in the highly-irregular symbolic space. Empirical experiments demonstrate the high quality of data generated by the proposed method, and that the LLMs, specifically LLaMA-2 and Mistral, when realigned with the generated data, surpass their state-of-the-art counterparts.
翻译:关于大型语言模型(LLMs)的一个关键问题是:它们在数学推理方面表现出的明显缺陷是固有的,还是仅仅由于缺乏高质量数学数据的训练所致。为探究此问题,我们开发了一种自动生成高质量监督数学数据集的方法。该方法通过对现有数学问题进行精细变异,确保新生成问题的多样性和有效性。这是通过一个神经符号数据生成框架实现的,该框架结合了LLMs在直觉非形式化方面的优势、数学求解器的精确符号推理能力,以及在高度不规则的符号空间中进行的投影马尔可夫链蒙特卡洛采样。实证实验表明,所提方法生成的数据质量很高,并且当LLMs(特别是LLaMA-2和Mistral)使用生成的数据进行重新对齐后,其性能超越了当前最先进的同类模型。