Modern large language models (LLMs), such as ChatGPT, exhibit a remarkable capacity for role-playing, enabling them to embody not only human characters but also non-human entities like a Linux terminal. This versatility allows them to simulate complex human-like interactions and behaviors within various contexts, as well as to emulate specific objects or systems. While these capabilities have enhanced user engagement and introduced novel modes of interaction, the influence of role-playing on LLMs' reasoning abilities remains underexplored. In this study, we introduce a strategically designed role-play prompting methodology and assess its performance under the zero-shot setting across twelve diverse reasoning benchmarks, encompassing arithmetic, commonsense reasoning, symbolic reasoning, and more. Leveraging models such as ChatGPT and Llama 2, our empirical results illustrate that role-play prompting consistently surpasses the standard zero-shot approach across most datasets. Notably, accuracy on AQuA rises from 53.5% to 63.8%, and on Last Letter from 23.8% to 84.2%. Beyond enhancing contextual understanding, we posit that role-play prompting serves as an implicit Chain-of-Thought (CoT) trigger, thereby improving the quality of reasoning. By comparing our approach with the Zero-Shot-CoT technique, which prompts the model to "think step by step", we further demonstrate that role-play prompting can generate a more effective CoT. This highlights its potential to augment the reasoning capabilities of LLMs.
翻译:现代大型语言模型(LLMs),如ChatGPT,展现出卓越的角色扮演能力,不仅能模拟人类角色,还能化身非人类实体(如Linux终端)。这种通用性使它们能够在各种情境中模拟复杂的人类互动与行为,并模仿特定物体或系统。尽管这些能力提升了用户参与度并开创了新型交互模式,但角色扮演对LLMs推理能力的影响仍鲜有研究。本研究提出一种策略性设计的角色扮演提示方法,并在涵盖算术、常识推理、符号推理等12个多样化推理基准的零样本设定下评估其性能。基于ChatGPT和Llama 2等模型的实证结果表明,角色扮演提示在大多数数据集上持续优于标准零样本方法。值得注意的是,AQuA数据集准确率从53.5%提升至63.8%,Last Letter数据集准确率从23.8%提升至84.2%。我们认为,角色扮演提示不仅增强了上下文理解,更作为隐式思维链(CoT)触发器提升了推理质量。通过将我们的方法与引导模型"逐步思考"的零样本思维链(Zero-Shot-CoT)技术对比,进一步证明角色扮演提示能生成更有效的CoT,凸显其在增强LLMs推理能力方面的潜力。