In recent years, Large Language Models such as GPT-3 showed remarkable capabilities in performing NLP tasks in the zero and few shot settings. On the other hand, the experiments highlighted the difficulty of GPT-3 in carrying out tasks that require a certain degree of reasoning, such as arithmetic operations. In this paper we evaluate the ability of Transformer Language Models to perform arithmetic operations following a pipeline that, before performing computations, decomposes numbers in units, tens, and so on. We denote the models fine-tuned with this pipeline with the name Calculon and we test them in the task of performing additions, subtractions and multiplications on the same test sets of GPT-3. Results show an increase of accuracy of 63% in the five-digit addition task. Moreover, we demonstrate the importance of the decomposition pipeline introduced, since fine-tuning the same Language Model without decomposing numbers results in 0% accuracy in the five-digit addition task.
翻译:近年来,GPT-3等大型语言模型在零样本和少样本场景下执行自然语言处理任务时展现出卓越能力。然而,实验也揭示了GPT-3在完成需要一定推理能力的任务(如算术运算)时存在的困难。本文评估了Transformer语言模型在执行算术运算时的能力,采用了一种在计算前将数字分解为个位、十位等单位的流水线方法。我们将通过该流水线微调的模型命名为Calculon,并在与GPT-3相同的测试集上检验其执行加法、减法和乘法的能力。结果显示,在五位数加法任务中准确率提升了63%。此外,我们证明了所引入的分解流水线的重要性:若不经数字分解直接微调同一语言模型,其五位数加法任务的准确率仅为0%。