The recent advancements of Large Language Models (LLMs), with their abundant world knowledge and capabilities of tool-using and reasoning, fostered many LLM planning algorithms. However, LLMs have not shown to be able to accurately solve complex combinatorial optimization problems. In Xie et al. (2024), the authors proposed TravelPlanner, a U.S. domestic travel planning benchmark, and showed that LLMs themselves cannot make travel plans that satisfy user requirements with a best success rate of 0.6%. In this work, we propose a framework that enables LLMs to formally formulate and solve the travel planning problem as a satisfiability modulo theory (SMT) problem and use SMT solvers interactively and automatically solve the combinatorial search problem. The SMT solvers guarantee the satisfiable of input constraints and the LLMs can enable a language-based interaction with our framework. When the input constraints cannot be satisfiable, our LLM-based framework will interactively offer suggestions to users to modify their travel requirements via automatic reasoning using the SMT solvers. We evaluate our framework with TravelPlanner and achieve a success rate of 97%. We also create a separate dataset that contain international travel benchmarks and use both dataset to evaluate the effectiveness of our interactive planning framework when the initial user queries cannot be satisfied. Our framework could generate valid plans with an average success rate of 78.6% for our dataset and 85.0% for TravelPlanner according to diverse humans preferences.
翻译:大型语言模型(LLMs)的最新进展及其丰富的世界知识、工具使用与推理能力,催生了许多基于LLM的规划算法。然而,LLMs尚未被证明能够准确解决复杂的组合优化问题。在Xie等人(2024)的研究中,他们提出了美国国内旅行规划基准TravelPlanner,并表明LLMs自身无法制定满足用户需求的旅行计划,最佳成功率仅为0.6%。本研究提出一种框架,使LLMs能够将旅行规划问题形式化地表述为可满足性模理论(SMT)问题,并利用SMT求解器以交互式自动方式解决组合搜索问题。SMT求解器确保输入约束的可满足性,而LLMs则支持基于语言的交互。当输入约束无法满足时,我们的LLM框架将通过SMT求解器的自动推理,向用户提供交互式建议以修改旅行需求。我们使用TravelPlanner评估该框架,成功率达到97%。此外,我们创建了包含国际旅行基准的独立数据集,并利用这两个数据集评估初始用户查询无法满足时,交互式规划框架的有效性。根据多样化的用户偏好,该框架可生成有效规划:在我们的数据集中平均成功率为78.6%,在TravelPlanner中为85.0%。