Quantities are distinct and critical components of texts that characterize the magnitude properties of entities, providing a precise perspective for the understanding of natural language, especially for reasoning tasks. In recent years, there has been a flurry of research on reasoning tasks based on large language models (LLMs), most of which solely focus on numerical values, neglecting the dimensional concept of quantities with units despite its importance. We argue that the concept of dimension is essential for precisely understanding quantities and of great significance for LLMs to perform quantitative reasoning. However, the lack of dimension knowledge and quantity-related benchmarks has resulted in low performance of LLMs. Hence, we present a framework to enhance the quantitative reasoning ability of language models based on dimension perception. We first construct a dimensional unit knowledge base (DimUnitKB) to address the knowledge gap in this area. We propose a benchmark DimEval consisting of seven tasks of three categories to probe and enhance the dimension perception skills of LLMs. To evaluate the effectiveness of our methods, we propose a quantitative reasoning task and conduct experiments. The experimental results show that our dimension perception method dramatically improves accuracy (43.55%->50.67%) on quantitative reasoning tasks compared to GPT-4.
翻译:数量是文本中独特且关键的组成部分,用于描述实体的量级属性,为自然语言理解(尤其是推理任务)提供精确视角。近年来,基于大型语言模型(LLMs)的推理任务研究蓬勃发展,但多数研究仅关注数值本身,忽视了含单位的数量维度概念,尽管这一概念至关重要。我们认为,维度概念是精确理解数量的核心要素,对LLMs执行定量推理具有重大意义。然而,维度知识及相关基准的缺失导致LLMs在此类任务中表现欠佳。为此,我们提出一种基于维度感知的语言模型定量推理能力增强框架。首先,我们构建维度单位知识库(DimUnitKB)以填补该领域的知识空白;其次,提出包含七项任务、三类类别的基准测试DimEval,用于探测与增强LLMs的维度感知能力。为验证方法有效性,我们设计定量推理任务并开展实验。结果表明,相较于GPT-4,我们的维度感知方法在定量推理任务上的准确率大幅提升(43.55%→50.67%)。