Solving Zebra Puzzles Using Constraint-Guided Multi-Agent Systems

Prior research has enhanced the ability of Large Language Models (LLMs) to solve logic puzzles using techniques such as chain-of-thought prompting or introducing a symbolic representation. These frameworks are still usually insufficient to solve complicated logical problems, such as Zebra puzzles, due to the inherent complexity of translating natural language clues into logical statements. We introduce a multi-agent system, ZPS, that integrates LLMs with an off the shelf theorem prover. This system tackles the complex puzzle-solving task by breaking down the problem into smaller, manageable parts, generating SMT (Satisfiability Modulo Theories) code to solve them with a theorem prover, and using feedback between the agents to repeatedly improve their answers. We also introduce an automated grid puzzle grader to assess the correctness of our puzzle solutions and show that the automated grader is reliable by evaluating it in a user-study. Our approach shows improvement in all three LLMs we tested, with GPT-4 showing 166% improvement in the number of fully correct solutions.

翻译：先前的研究通过思维链提示或引入符号表示等技术，提升了大型语言模型（LLM）求解逻辑谜题的能力。然而，由于将自然语言线索转化为逻辑语句的内在复杂性，这些框架通常仍不足以解决复杂的逻辑问题，例如斑马谜题。我们提出了一种多智能体系统 ZPS，该系统将 LLM 与现成的定理证明器相结合。该系统通过将问题分解为更小、更易处理的部分，生成 SMT（可满足性模理论）代码以利用定理证明器求解，并利用智能体之间的反馈反复改进答案，从而应对复杂的谜题求解任务。我们还引入了一个自动化的网格谜题评分器来评估我们谜题解答的正确性，并通过用户研究评估证明了该自动评分器的可靠性。我们的方法在我们测试的所有三种 LLM 上都显示出改进，其中 GPT-4 在完全正确解的数量上提升了 166%。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日