This paper show a work on better use of LLMs with SelfzCoT a self-prompt zero-shot CoT. Specifically, on the zero-shot arithmetic reasoning tasks, the accuracy of the proposed SelfzCoT is improved with GSM8K from 40.50% to 82.34%, with MultiArith from 79.3% to 94.7%, with ADDSUB from 74.70% to 94.10%, with SingleEq from 78.70% to 91.30%, with AQUA from 31.90% to 82.33%, and with SVAMP from 63.70% to 79.70%. Totally, using the first two lasting path activations to LLM and particularly, the code-level self-prompt, the SelfzCoT has a huge improvement on all six zero-shot arithmetic reasoning tasks. Additionally, our modified zero-shot CoT (MzCoT) also achieves remarkable performance in the reasoning tasks. The accuracy of the proposed MzCoT is enhanced with GSM8K from 40.50% to 76.32%, with MultiArith from 79.3% to 96.97%, with ADDSUB from 74.70% to 92.39%, with SingleEq from 78.70% to 94.60%, with AQUA from 31.90% to 79.90%, and with SVAMP from 63.70% to 81.50%. Notably, SelfzCoT has the best performance on GSM8K among all the recent zero-shot methods.
翻译:本文展示了利用SelfzCoT(一种自提示零样本思维链方法)更好地使用大语言模型(LLMs)的工作。具体而言,在零样本算术推理任务中,所提出的SelfzCoT在GSM8K上的准确率从40.50%提升至82.34%,MultiArith上从79.3%提升至94.7%,ADDSUB上从74.70%提升至94.10%,SingleEq上从78.70%提升至91.30%,AQUA上从31.90%提升至82.33%,SVAMP上从63.70%提升至79.70%。总体而言,通过使用前两个持续路径激活作用于LLM,特别是代码级自提示,SelfzCoT在所有六个零样本算术推理任务上均获得了巨大提升。此外,我们改进的零样本思维链方法(MzCoT)在推理任务中也取得了显著性能。所提出的MzCoT在GSM8K上的准确率从40.50%提升至76.32%,MultiArith上从79.3%提升至96.97%,ADDSUB上从74.70%提升至92.39%,SingleEq上从78.70%提升至94.60%,AQUA上从31.90%提升至79.90%,SVAMP上从63.70%提升至81.50%。值得注意的是,在所有近期零样本方法中,SelfzCoT在GSM8K上取得了最佳性能。