LLMs as Method Actors: A Model for Prompt Engineering and Architecture

We introduce "Method Actors" as a mental model for guiding LLM prompt engineering and prompt architecture. Under this mental model, LLMs should be thought of as actors; prompts as scripts and cues; and LLM responses as performances. We apply this mental model to the task of improving LLM performance at playing Connections, a New York Times word puzzle game that prior research identified as a challenging benchmark for evaluating LLM reasoning. Our experiments with GPT-4o show that a "Method Actors" approach can significantly improve LLM performance over both a vanilla and "Chain of Thoughts" approach. A vanilla approach solves 27% of Connections puzzles in our dataset and a "Chain of Thoughts" approach solves 41% of puzzles, whereas our strongest "Method Actor" approach solves 86% of puzzles. We also test OpenAI's newest model designed specifically for complex reasoning tasks, o1-preview. When asked to solve a puzzle all at once, o1-preview solves 79% of Connections puzzles in our dataset, and when allowed to build puzzle solutions one guess at a time over multiple API calls, o1-preview solves 100% of the puzzles. Incorporating a "Method Actor" prompt architecture increases the percentage of puzzles that o1-preview solves perfectly from 76% to 87%.

翻译：我们引入"方法派演员"作为一种指导大语言模型提示工程与提示架构的心智模型。在此心智模型中，应将大语言模型视为演员；提示视为剧本与线索；大语言模型的回应视为表演。我们将此心智模型应用于提升大语言模型在《Connections》（《纽约时报》单词解谜游戏）中的表现，先前研究已将该游戏确定为评估大语言模型推理能力的挑战性基准。我们使用GPT-4o进行的实验表明，"方法派演员"方法相较于基础方法和"思维链"方法能显著提升大语言模型性能。基础方法在我们的数据集中解决了27%的《Connections》谜题，"思维链"方法解决了41%的谜题，而我们最强的"方法派演员"方法则解决了86%的谜题。我们还测试了OpenAI专为复杂推理任务设计的最新模型o1-preview。当要求一次性求解谜题时，o1-preview解决了数据集中79%的《Connections》谜题；当允许通过多次API调用逐次构建谜题解时，o1-preview实现了100%的解决率。采用"方法派演员"提示架构后，o1-preview完美求解谜题的比例从76%提升至87%。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日