As a way of communicating with users and any LLMs like GPT or PaLM2, prompting becomes an increasingly important research topic for better utilization of LLMs. Although simple prompting performs well on single-step questions, it cannot permanently activate the correct knowledge path for multi-step reasoning tasks. The chain of thought (CoT), which often contains zero-shot CoT and few-shot CoT, is a recently developed prompting method that can explain the reasoning process to the LLM and outperforms simple prompting in three challenging reasoning tasks, including arithmetic, symbolic, and commonsense reasoning. In this paper, we propose a novel hint of thought (HoT) prompting with explainability and zero-shot generalization. First, it is decomposed into the following three steps: explainable sub-questions, logical reasoning, and answer extraction. Second, such three steps are sequentially ordered in the format of step-by-step hints, which can be easily adjusted and explained to different tasks. Finally, experimental results demonstrate that our HoT prompting has a significant advantage on the zero-shot reasoning task compared to existing zero-shot CoT. We did zero-shot experiments on math tasks like GSM8K, ADDSUB, AQUA, SVAMP and commonsense tasks such as StrategyQA. In particular, the accuracy of the proposed HoT prompting is improved with GSM8K from 40.50% to 67.80%, with AQUA from 31.9% to 46.4%, with SVAMP from 63.7% to 76.9%, and with ADDSUB from 74.7% to 87.34%, respectively, which even defeats the competitive PoT approach on GSM8k, AQUA, and SVAMP.
翻译:作为用户与GPT、PaLM2等大语言模型之间的交互方式,提示技术(Prompting)已成为提升大语言模型利用效率的重要研究方向。虽然简单提示在单步问题中表现良好,但对于多步推理任务,它无法持续激活正确的知识路径。思维链(CoT)——通常包含零样本CoT与少样本CoT——作为近期发展的提示方法,能够向大语言模型解释推理过程,并在算术推理、符号推理与常识推理这三类具有挑战性的推理任务上显著优于简单提示。本文提出了一种新型的可解释零样本提示方法——提示思维(HoT)。该方法首先分解为以下三个步骤:可解释子问题、逻辑推理与答案抽取;其次,这三个步骤以逐步提示的形式顺序排列,便于针对不同任务进行调整与解释;最后,实验结果表明,相较于现有零样本CoT方法,所提HoT提示在零样本推理任务中具有显著优势。我们在GSM8K、ADDSUB、AQUA、SVAMP等数学任务及StrategyQA等常识任务上进行了零样本实验。具体而言,HoT提示在GSM8K上准确率从40.50%提升至67.80%,在AQUA上从31.9%提升至46.4%,在SVAMP上从63.7%提升至76.9%,在ADDSUB上从74.7%提升至87.34%,甚至击败了具有竞争力的PoT方法。