Existing work has found that the prompt engineering heavily influences the performance of large language models (LLMs). Chain-of-thought (CoT), as a popular prompt engineering technique, prompted LLMs using in-context examples with reasoning steps. In current studies, the few-shot examples of CoT are generally handcrafted by humans. However, how the text style of in-context examples influence the outputs of LLMs still remains under-explored. This paper presents a novel and effective approach, named \textbf{AlignCoT}, to improve the reasoning capability of LLMs by aligning the in-context examples with the native style of LLMs. ``Native'' refers to the inherent characteristic style of LLMs which can be probed by original zero-shot scenarios. AlignCoT is orthogonal to other prompt engineering methods, making it easy to combine with state-of-the-art techniques to further improve the LLMs' performance. We conduct extensive and comprehensive experiments on several benchmarks. The empirical results demonstrate that our AlignCoTsignificantly improves performance over the carefully handcrafted in-context examples. For instance, with GPT-3.5-turbo, we observed a +2.5\% improvement on GSM8K. Furthermore, our AlignCoT consistently improve the performance when combined with other state-of-the-art prompt engineering methods. The source code and dataset will be available at \href{https://github.com/yangzhch6/AlignCoT}{https://github.com/yangzhch6/AlignCoT}.
翻译:现有研究发现,提示工程显著影响大型语言模型的性能。思维链作为一种流行的提示工程技术,通过带有推理步骤的上下文示例来提示大型语言模型。当前研究中,思维链的少样本示例通常由人工精心设计。然而,上下文示例的文本风格如何影响大型语言模型的输出仍待深入探索。本文提出了一种新颖且有效的方法——名为AlignCoT——通过将上下文示例与大型语言模型的"原生"风格对齐来提升其推理能力。"原生"指的是大型语言模型固有的特征风格,可通过原始零样本场景探测得到。AlignCoT与其他提示工程方法正交,易于与最先进技术结合以进一步提升大型语言模型性能。我们在多个基准上进行了广泛而全面的实验。实证结果表明,我们的AlignCoT在精心设计的上下文示例基础上显著提升了性能。例如,使用GPT-3.5-turbo时,我们在GSM8K上观察到+2.5%的提升。此外,当与其他最先进的提示工程方法结合时,我们的AlignCoT持续改进性能。源代码和数据集将在https://github.com/yangzhch6/AlignCoT 公开。