This paper investigates the question of what makes math word problems (MWPs) challenging for large language models (LLMs). We conduct an in-depth analysis of the key linguistic and mathematical characteristics of MWPs. In addition, we train feature-based classifiers to better understand the impact of each feature on the overall difficulty of MWPs for prominent LLMs and investigate whether this helps predict how well LLMs fare against specific categories of MWPs.
翻译:本文探究了数学应用题对大语言模型构成挑战的原因。我们对数学应用题的关键语言和数学特征进行了深入分析。此外,我们训练了基于特征的分类器,以更好地理解每个特征对主流大语言模型处理数学应用题整体难度的影响,并研究这是否有助于预测大语言模型在特定类别数学应用题上的表现。