We introduce "Method Actors" as a mental model for guiding LLM prompt engineering and prompt architecture. Under this mental model, LLMs should be thought of as actors; prompts as scripts and cues; and LLM responses as performances. We apply this mental model to the task of improving LLM performance at playing Connections, a New York Times word puzzle game that prior research identified as a challenging benchmark for evaluating LLM reasoning. Our experiments with GPT-4o show that a "Method Actors" approach can significantly improve LLM performance over both a vanilla and "Chain of Thoughts" approach. A vanilla approach solves 27% of Connections puzzles in our dataset and a "Chain of Thoughts" approach solves 41% of puzzles, whereas our strongest "Method Actor" approach solves 86% of puzzles. We also test OpenAI's newest model designed specifically for complex reasoning tasks, o1-preview. When asked to solve a puzzle all at once, o1-preview solves 79% of Connections puzzles in our dataset, and when allowed to build puzzle solutions one guess at a time over multiple API calls, o1-preview solves 100% of the puzzles. Incorporating a "Method Actor" prompt architecture increases the percentage of puzzles that o1-preview solves perfectly from 76% to 87%.
翻译:我们引入"方法派演员"作为一种指导大语言模型提示工程与提示架构的心智模型。在此心智模型中,应将大语言模型视为演员;提示视为剧本与线索;大语言模型的回应视为表演。我们将此心智模型应用于提升大语言模型在《Connections》(《纽约时报》单词解谜游戏)中的表现,先前研究已将该游戏确定为评估大语言模型推理能力的挑战性基准。我们使用GPT-4o进行的实验表明,"方法派演员"方法相较于基础方法和"思维链"方法能显著提升大语言模型性能。基础方法在我们的数据集中解决了27%的《Connections》谜题,"思维链"方法解决了41%的谜题,而我们最强的"方法派演员"方法则解决了86%的谜题。我们还测试了OpenAI专为复杂推理任务设计的最新模型o1-preview。当要求一次性求解谜题时,o1-preview解决了数据集中79%的《Connections》谜题;当允许通过多次API调用逐次构建谜题解时,o1-preview实现了100%的解决率。采用"方法派演员"提示架构后,o1-preview完美求解谜题的比例从76%提升至87%。