This paper show a work on better use of LLMs with SelfzCoT a self-prompt zero-shot CoT. Specifically, on the zero-shot arithmetic reasoning tasks, the accuracy of the proposed SelfzCoT is improved with GSM8K from 40.50% to 82.34%, with MultiArith from 79.3% to 94.7%, with ADDSUB from 74.70% to 94.10%, with SingleEq from 78.70% to 91.30%, with AQUA from 31.90% to 82.33%, and with SVAMP from 63.70% to 79.70%. Totally, using the first two lasting path activations to LLM and particularly, the code-level self-prompt, the SelfzCoT has a huge improvement on all six zero-shot arithmetic reasoning tasks. Additionally, our modified zero-shot CoT (MzCoT) also achieves remarkable performance in the reasoning tasks. The accuracy of the proposed MzCoT is enhanced with GSM8K from 40.50% to 76.32%, with MultiArith from 79.3% to 96.97%, with ADDSUB from 74.70% to 92.39%, with SingleEq from 78.70% to 94.60%, with AQUA from 31.90% to 79.90%, and with SVAMP from 63.70% to 81.50%. Notably, SelfzCoT has the best performance on GSM8K among all the recent zero-shot methods.
翻译:本文展示了通过SelfzCoT(一种自提示零样本思维链方法)更好利用大语言模型的工作。具体而言,在零样本算术推理任务中,所提出的SelfzCoT在GSM8K上的准确率从40.50%提升至82.34%,在MultiArith上从79.3%提升至94.7%,在ADDSUB上从74.70%提升至94.10%,在SingleEq上从78.70%提升至91.30%,在AQUA上从31.90%提升至82.33%,在SVAMP上从63.70%提升至79.70%。总体而言,通过向大语言模型施加前两个持续路径激活,特别是代码级自提示,SelfzCoT在所有六个零样本算术推理任务上均有大幅改进。此外,我们修改后的零样本思维链(MzCoT)在推理任务上也取得了显著性能。所提出的MzCoT在GSM8K上的准确率从40.50%提升至76.32%,在MultiArith上从79.3%提升至96.97%,在ADDSUB上从74.70%提升至92.39%,在SingleEq上从78.70%提升至94.60%,在AQUA上从31.90%提升至79.90%,在SVAMP上从63.70%提升至81.50%。值得注意的是,SelfzCoT在GSM8K上的表现优于当前所有零样本方法。