Multi-constraint planning involves identifying, evaluating, and refining candidate plans while satisfying multiple, potentially conflicting constraints. Existing large language model (LLM) approaches face fundamental limitations in this domain. Pure reasoning paradigms, which rely on long natural language chains, are prone to inconsistency, error accumulation, and prohibitive cost as constraints compound. Conversely, LLMs combined with coding- or solver-based strategies lack flexibility: they often generate problem-specific code from scratch or depend on fixed solvers, failing to capture generalizable logic across diverse problems. To address these challenges, we introduce the Scalable COde Planning Engine (SCOPE), a framework that disentangles query-specific reasoning from generic code execution. By separating reasoning from execution, SCOPE produces solver functions that are consistent, deterministic, and reusable across queries while requiring only minimal changes to input parameters. SCOPE achieves state-of-the-art performance while lowering cost and latency. For example, with GPT-4o, it reaches 93.1% success on TravelPlanner, a 61.6% gain over the best baseline (CoT) while cutting inference cost by 1.4x and time by ~4.67x. Code is available at https://github.com/DerrickGXD/SCOPE.
翻译:多约束规划涉及在满足多个潜在冲突约束的同时,识别、评估和优化候选方案。现有的大语言模型(LLM)方法在此领域面临根本性局限。纯推理范式依赖冗长的自然语言链,随着约束增加,容易产生不一致性、错误累积和难以承受的成本。相反,LLM与基于编码或求解器的策略结合则缺乏灵活性:它们通常从头生成针对特定问题的代码,或依赖固定的求解器,无法捕捉跨不同问题的可泛化逻辑。为应对这些挑战,我们引入了可扩展代码规划引擎(SCOPE),这是一个将特定查询的推理与通用代码执行解耦的框架。通过分离推理与执行,SCOPE生成的求解器函数具有一致性、确定性,且可跨查询重用,同时仅需对输入参数进行最小改动。SCOPE实现了最先进的性能,同时降低了成本和延迟。例如,使用GPT-4o时,它在TravelPlanner任务上达到93.1%的成功率,相比最佳基线(CoT)提升了61.6%,同时将推理成本降低了1.4倍,时间减少了约4.67倍。代码发布于https://github.com/DerrickGXD/SCOPE。