The performance of Large Language Models (LLMs) in reasoning tasks depends heavily on prompt design, with Chain-of-Thought (CoT) and self-consistency being critical methods that enhance this ability. However, these methods do not fully exploit the answers generated by the LLM to guide subsequent responses. This paper proposes a new prompting method, named Progressive-Hint Prompting (PHP), that enables automatic multiple interactions between users and LLMs by using previously generated answers as hints to progressively guide toward the correct answers. PHP is orthogonal to CoT and self-consistency, making it easy to combine with state-of-the-art techniques to further improve performance. We conducted extensive and comprehensive experiments on seven benchmarks. The results show that PHP significantly improves accuracy while remaining highly efficient. For instance, with text-davinci-003, we observed a 4.2% improvement on GSM8K with greedy decoding compared to Complex CoT, and a 46.17% reduction in sample paths with self-consistency. With GPT-4 and PHP, we achieve state-of-the-art performances on SVAMP (89.1% -> 91.9%), GSM8K (92% -> 95.5%), AQuA (76.4% -> 79.9%) and MATH (50.3% -> 53.9%).
翻译:大型语言模型(LLMs)在推理任务中的表现高度依赖于提示设计,其中思维链(CoT)和自一致性是增强该能力的关键方法。然而,这些方法并未充分利用LLM生成的答案来指导后续响应。本文提出了一种名为渐进式提示(PHP)的新型提示方法,通过将先前生成的答案作为提示,逐步引导用户与LLM之间的自动多次交互,从而逼近正确答案。PHP与CoT和自一致性方法正交,易于与现有最优技术结合以进一步提升性能。我们在七个基准上开展了全面且广泛的实验。结果表明,PHP在显著提升准确率的同时保持高效性。例如,在使用text-davinci-003模型时,与复杂CoT相比,贪心解码在GSM8K上实现了4.2%的提升,且自一致性所需的样本路径减少了46.17%。结合GPT-4与PHP,我们在SVAMP(89.1%→91.9%)、GSM8K(92%→95.5%)、AQuA(76.4%→79.9%)和MATH(50.3%→53.9%)上取得了当前最优性能。