Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves

Misunderstandings arise not only in interpersonal communication but also between humans and Large Language Models (LLMs). Such discrepancies can make LLMs interpret seemingly unambiguous questions in unexpected ways, yielding incorrect responses. While it is widely acknowledged that the quality of a prompt, such as a question, significantly impacts the quality of the response provided by LLMs, a systematic method for crafting questions that LLMs can better comprehend is still underdeveloped. In this paper, we present a method named `Rephrase and Respond' (RaR), which allows LLMs to rephrase and expand questions posed by humans and provide responses in a single prompt. This approach serves as a simple yet effective prompting method for improving performance. We also introduce a two-step variant of RaR, where a rephrasing LLM first rephrases the question and then passes the original and rephrased questions together to a different responding LLM. This facilitates the effective utilization of rephrased questions generated by one LLM with another. Our experiments demonstrate that our methods significantly improve the performance of different models across a wide range to tasks. We further provide a comprehensive comparison between RaR and the popular Chain-of-Thought (CoT) methods, both theoretically and empirically. We show that RaR is complementary to CoT and can be combined with CoT to achieve even better performance. Our work not only contributes to enhancing LLM performance efficiently and effectively but also sheds light on a fair evaluation of LLM capabilities. Data and codes are available at https://github.com/uclaml/Rephrase-and-Respond.

翻译：误解不仅出现在人际沟通中，也存在于人类与大型语言模型之间。这种差异可能使大型语言模型以意想不到的方式解读看似明确的问题，从而产生错误回答。尽管众所周知，提示（如问题）的质量会显著影响大型语言模型提供的回答质量，但系统地构建能帮助大型语言模型更好理解问题的方法仍不成熟。本文提出一种名为“重述与回答”（RaR）的方法，它允许大型语言模型在单个提示中重述并扩展人类提出的问题，同时提供回答。该方法作为一种简单而有效的提示技术，能够提升性能。我们还引入了RaR的两步变体，其中重述大型语言模型首先重述问题，然后将原始问题和重述问题一起传递给另一个不同的回答大型语言模型。这有助于有效利用一个大型语言模型生成的重述问题，供另一个模型使用。实验表明，我们的方法在广泛的任务中显著提升了不同模型的性能。我们进一步从理论和实证角度，对RaR与流行的思维链方法进行了全面比较。结果显示，RaR与思维链方法互补，并且可以与思维链方法结合以实现更优性能。我们的工作不仅高效且有效地增强了大型语言模型的性能，还揭示了公平评估大型语言模型能力的途径。数据和代码可在 https://github.com/uclaml/Rephrase-and-Respond 获取。