This paper show a work on better use of LLMs with SelfzCoT a self-prompt zero-shot CoT. Specifically, on the zero-shot arithmetic reasoning tasks, the accuracy of the proposed SelfzCoT is improved with GSM8K from 40.50% to 82.34%, with MultiArith from 79.3% to 94.7%, with ADDSUB from 74.70% to 94.10%, with SingleEq from 78.70% to 91.30%, with AQUA from 31.90% to 82.33%, and with SVAMP from 63.70% to 79.70%. Totally, using the first two lasting path activations to LLM and particularly, the code-level self-prompt, the SelfzCoT has a huge improvement on all six zero-shot arithmetic reasoning tasks. Additionally, our modified zero-shot CoT (MzCoT) also achieves remarkable performance in the reasoning tasks. The accuracy of the proposed MzCoT is enhanced with GSM8K from 40.50% to 76.32%, with MultiArith from 79.3% to 96.97%, with ADDSUB from 74.70% to 92.39%, with SingleEq from 78.70% to 94.60%, with AQUA from 31.90% to 79.90%, and with SVAMP from 63.70% to 81.50%. Notably, SelfzCoT has the best performance on GSM8K among all the recent zero-shot methods.
翻译:本文展示了一项关于更好利用大语言模型(LLMs)的研究,提出了SelfzCoT——一种自提示零样本思维链(zero-shot CoT)方法。具体而言,在零样本算术推理任务中,所提出的SelfzCoT在GSM8K上的准确率从40.50%提升至82.34%,在MultiArith上从79.3%提升至94.7%,在ADDSUB上从74.70%提升至94.10%,在SingleEq上从78.70%提升至91.30%,在AQUA上从31.90%提升至82.33%,在SVAMP上从63.70%提升至79.70%。总体而言,通过向LLM应用前两个持续路径激活,特别是代码级自提示,SelfzCoT在所有六个零样本算术推理任务上均取得了巨大改进。此外,我们改进的零样本思维链(MzCoT)在推理任务中也表现卓越。所提出的MzCoT在GSM8K上的准确率从40.50%提升至76.32%,在MultiArith上从79.3%提升至96.97%,在ADDSUB上从74.70%提升至92.39%,在SingleEq上从78.70%提升至94.60%,在AQUA上从31.90%提升至79.90%,在SVAMP上从63.70%提升至81.50%。值得注意的是,在所有近期零样本方法中,SelfzCoT在GSM8K上取得了最佳性能。