Math Word Problem (MWP) solving presents a challenging task in Natural Language Processing (NLP). This study aims to provide MWP solvers with a more diverse training set, ultimately improving their ability to solve various math problems. We propose several methods for data augmentation by modifying the problem texts and equations, such as synonym replacement, rule-based: question replacement, and rule based: reversing question methodologies over two English MWP datasets. This study extends by introducing a new in-context learning augmentation method, employing the Llama-7b language model. This approach involves instruction-based prompting for rephrasing the math problem texts. Performance evaluations are conducted on 9 baseline models, revealing that augmentation methods outperform baseline models. Moreover, concatenating examples generated by various augmentation methods further improves performance.
翻译:数学应用题解答是自然语言处理中的一项挑战性任务。本研究旨在为数学应用题求解器提供更多样化的训练数据集,从而提升其解决各类数学问题的能力。我们提出了多种通过修改问题文本与方程式的数据增强方法,包括同义词替换、基于规则的问题替换以及基于规则的逆向提问方法,并在两个英文数学应用题数据集上进行实验。本研究进一步引入了一种新的基于上下文学习的增强方法,采用Llama-7b语言模型,通过指令引导提示对数学问题文本进行重述。我们在9个基线模型上进行了性能评估,结果表明增强方法显著优于基线模型。此外,将不同增强方法生成的样本进行拼接,可进一步提升模型性能。