Prior research has enhanced the ability of Large Language Models (LLMs) to solve logic puzzles using techniques such as chain-of-thought prompting or introducing a symbolic representation. These frameworks are still usually insufficient to solve complicated logical problems, such as Zebra puzzles, due to the inherent complexity of translating natural language clues into logical statements. We introduce a multi-agent system, ZPS, that integrates LLMs with an off the shelf theorem prover. This system tackles the complex puzzle-solving task by breaking down the problem into smaller, manageable parts, generating SMT (Satisfiability Modulo Theories) code to solve them with a theorem prover, and using feedback between the agents to repeatedly improve their answers. We also introduce an automated grid puzzle grader to assess the correctness of our puzzle solutions and show that the automated grader is reliable by evaluating it in a user-study. Our approach shows improvement in all three LLMs we tested, with GPT-4 showing 166% improvement in the number of fully correct solutions.
翻译:先前的研究通过链式思维提示或引入符号表示等技术,增强了大型语言模型(LLMs)解决逻辑谜题的能力。然而,由于将自然语言线索转化为逻辑陈述的固有复杂性,这些框架在解决诸如斑马难题等复杂逻辑问题时仍显不足。本文提出了一种多智能体系统ZPS,将LLMs与现成的定理证明器相结合。该系统通过将问题分解为更小、更易管理的部分,生成SMT(可满足性模理论)代码以使用定理证明器求解,并利用智能体间的反馈反复优化答案,从而应对复杂的谜题求解任务。我们还引入了一种自动化网格谜题评分器,以评估谜题解决方案的正确性,并通过用户研究验证了该自动化评分器的可靠性。我们的方法在我们测试的所有三种LLMs上均显示出性能提升,其中GPT-4在完全正确解决方案的数量上提升了166%。