Language models are increasingly being deployed for general problem solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inference. This means they can fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role. To surmount these challenges, we introduce a new framework for language model inference, Tree of Thoughts (ToT), which generalizes over the popular Chain of Thought approach to prompting language models, and enables exploration over coherent units of text (thoughts) that serve as intermediate steps toward problem solving. ToT allows LMs to perform deliberate decision making by considering multiple different reasoning paths and self-evaluating choices to decide the next course of action, as well as looking ahead or backtracking when necessary to make global choices. Our experiments show that ToT significantly enhances language models' problem-solving abilities on three novel tasks requiring non-trivial planning or search: Game of 24, Creative Writing, and Mini Crosswords. For instance, in Game of 24, while GPT-4 with chain-of-thought prompting only solved 4% of tasks, our method achieved a success rate of 74%. Code repo with all prompts: https://github.com/princeton-nlp/tree-of-thought-llm.
翻译:语言模型正越来越多地被部署用于跨广泛任务的通用问题求解,但在推理过程中仍局限于词元级、从左到右的决策机制。这意味着在需要探索、战略性预判或初始决策起关键作用的任务中,它们可能表现不足。为克服这些挑战,我们提出了一种新的语言模型推理框架——思维之树(Tree of Thoughts, ToT),该框架推广了流行的思维链(Chain of Thought)提示方法,使得语言模型能够探索作为问题求解中间步骤的连贯文本单元(思维)。ToT允许语言模型通过考虑多条不同的推理路径、自我评估抉择以决定下一步行动,并在必要时进行前瞻或回溯来做出全局决策,从而执行审慎的决策过程。实验表明,在三个需要非平凡规划或搜索的新任务(24点游戏、创意写作和迷你填字游戏)上,ToT显著提升了语言模型的问题求解能力。例如,在24点游戏中,采用思维链提示的GPT-4仅解决了4%的任务,而我们的方法实现了74%的成功率。所有提示的代码仓库:https://github.com/princeton-nlp/tree-of-thought-llm。