Within numerical reasoning, understanding numbers themselves is still a challenge for existing language models. Simple generalisations, such as solving 100+200 instead of 1+2, can substantially affect model performance (Sivakumar and Moosavi, 2023). Among various techniques, character-level embeddings of numbers have emerged as a promising approach to improve number representation. However, this method has limitations as it leaves the task of aggregating digit representations to the model, which lacks direct supervision for this process. In this paper, we explore the use of mathematical priors to compute aggregated digit embeddings and explicitly incorporate these aggregates into transformer models. This can be achieved either by adding a special token to the input embeddings or by introducing an additional loss function to enhance correct predictions. We evaluate the effectiveness of incorporating this explicit aggregation, analysing its strengths and shortcomings, and discuss future directions to better benefit from this approach. Our methods, while simple, are compatible with any pretrained model, easy to implement, and have been made publicly available.
翻译:在数值推理领域,现有语言模型对数字本身的理解仍面临挑战。简单的泛化任务(例如求解100+200而非1+2)可能显著影响模型性能(Sivakumar与Moosavi,2023)。在各种技术中,数字的字符级嵌入已成为改进数值表征的有效途径。然而该方法存在局限:其将聚合数字表征的任务完全交由模型处理,而该过程缺乏直接监督。本文探索利用数学先验计算聚合数字嵌入,并将这些聚合表征显式融入Transformer模型。实现方式包括在输入嵌入中添加特殊标记,或引入额外损失函数以增强预测准确性。我们评估了显式聚合机制的有效性,分析其优势与不足,并探讨未来如何更好地利用该方法。我们提出的方案虽简洁,但兼容所有预训练模型,易于实现,且已开源发布。