Misunderstandings arise not only in interpersonal communication but also between humans and Large Language Models (LLMs). Such discrepancies can make LLMs interpret seemingly unambiguous questions in unexpected ways, yielding incorrect responses. While it is widely acknowledged that the quality of a prompt, such as a question, significantly impacts the quality of the response provided by LLMs, a systematic method for crafting questions that LLMs can better comprehend is still underdeveloped. In this paper, we present a method named `Rephrase and Respond' (RaR), which allows LLMs to rephrase and expand questions posed by humans and provide responses in a single prompt. This approach serves as a simple yet effective prompting method for improving performance. We also introduce a two-step variant of RaR, where a rephrasing LLM first rephrases the question and then passes the original and rephrased questions together to a different responding LLM. This facilitates the effective utilization of rephrased questions generated by one LLM with another. Our experiments demonstrate that our methods significantly improve the performance of different models across a wide range to tasks. We further provide a comprehensive comparison between RaR and the popular Chain-of-Thought (CoT) methods, both theoretically and empirically. We show that RaR is complementary to CoT and can be combined with CoT to achieve even better performance. Our work not only contributes to enhancing LLM performance efficiently and effectively but also sheds light on a fair evaluation of LLM capabilities. Data and codes are available at https://github.com/uclaml/Rephrase-and-Respond.
翻译:误解不仅存在于人际交流中,也出现在人类与大语言模型(LLMs)之间。这些差异可能导致LLMs以意料之外的方式解读看似明确的问题,从而生成错误的回答。尽管人们普遍认识到提示(如问题)的质量会显著影响LLMs提供回答的质量,但系统性地构建LLMs能更好理解的问题的方法仍不成熟。本文提出一种名为“重新表述与回应”(RaR)的方法,该方法允许LLMs在单次提示中重新表述和扩展人类提出的问题,并同时提供回答。这种方法作为一种简单而有效的提示方法,能提升模型性能。我们还引入了RaR的两步变体:首先由一个重新表述LLM对问题进行改写,然后将原始问题与改写后的问题一同输入到另一个不同的回应LLM中。这有助于有效利用一个LLM生成的改写问题,并将其应用于另一个LLM中。实验表明,我们的方法在广泛的任务中显著提升了不同模型的性能。我们进一步从理论和实证角度,对RaR与流行的思维链(CoT)方法进行了全面比较。研究表明,RaR与CoT具有互补性,并可结合使用以实现更优性能。我们的工作不仅有助于高效且有效地提升LLM性能,还为公平评估LLM能力提供了新视角。数据和代码可在 https://github.com/uclaml/Rephrase-and-Respond 获取。