The performance of Large Language Models (LLMs) in reasoning tasks depends heavily on prompt design, with Chain-of-Thought (CoT) and self-consistency being critical methods that enhance this ability. However, these methods do not fully exploit the answers generated by the LLM to guide subsequent responses. This paper proposes a new prompting method, named Progressive-Hint Prompting (PHP), that enables automatic multiple interactions between users and LLMs by using previously generated answers as hints to progressively guide toward the correct answers. PHP is orthogonal to CoT and self-consistency, making it easy to combine with state-of-the-art techniques to further improve performance. We conducted extensive and comprehensive experiments on seven benchmarks. The results show that PHP significantly improves accuracy while remaining highly efficient. For instance, with text-davinci-003, we observed a 4.2% improvement on GSM8K with greedy decoding compared to Complex CoT, and a 46.17% reduction in sample paths with self-consistency. With GPT-4 and PHP, we achieve state-of-the-art performances on SVAMP (89.1% -> 91.9%), GSM8K (92% -> 95.5%), AQuA (76.4% -> 79.9%) and MATH (50.3% -> 53.9%).
翻译:大语言模型(LLMs)在推理任务中的表现高度依赖提示设计,其中思维链(Chain-of-Thought, CoT)和自一致性是增强该能力的关键方法。然而,这些方法并未充分利用LLMs生成的答案来指导后续回答。本文提出一种名为渐进提示法(Progressive-Hint Prompting, PHP)的新型提示方法,通过将先前生成的答案作为提示,逐步引导LLMs走向正确结果,从而实现用户与模型间的自动多重交互。PHP与CoT及自一致性正交,易于与前沿技术结合以进一步提升性能。我们在七个基准上进行了广泛而全面的实验。结果表明,PHP在保持高效率的同时显著提升了准确性。例如,在text-davinci-003上,采用贪婪解码时,相较于复杂CoT,我们在GSM8K上实现了4.2%的提升;采用自一致性时,样本路径减少了46.17%。结合GPT-4与PHP,我们在SVAMP(89.1%→91.9%)、GSM8K(92%→95.5%)、AQuA(76.4%→79.9%)和MATH(50.3%→53.9%)上达到了最先进性能。