Large Language Models (LLMs) have enabled automated heuristic design (AHD) for combinatorial optimization problems (COPs), but existing frameworks' reliance on fixed evolutionary rules and static prompt templates often leads to myopic heuristic generation, redundant evaluations, and limited reasoning about how new heuristics should be derived. We propose a novel multi-agent reasoning framework, referred to as Planning through World Model for Automated Heuristic Design via Self-Evolving LLMs (PathWise), which formulates heuristic generation as a sequential decision process over an entailment graph serving as a compact, stateful memory of the search trajectory. This approach allows the system to carry forward past decisions and reuse or avoid derivation information across generations. A policy agent plans evolutionary actions, a world model agent generates heuristic rollouts conditioned on those actions, and critic agents provide routed reflections summarizing lessons from prior steps, shifting LLM-based AHD from trial-and-error evolution toward state-aware planning through reasoning. Experiments across diverse COPs show that PathWise converges faster to better heuristics, generalizes across different LLM backbones, and scales to larger problem sizes.
翻译:大语言模型(LLMs)已为组合优化问题(COPs)的自动启发式设计(AHD)提供了可能,但现有框架依赖固定的演化规则和静态提示模板,往往导致启发式生成短视、评估冗余,且对新启发式应如何推导的推理能力有限。我们提出一种新颖的多智能体推理框架,称为通过世界模型规划实现基于自演化大语言模型的自动启发式设计(PathWise),该框架将启发式生成建模为在蕴含图上的序列决策过程,该图作为搜索轨迹的紧凑、有状态记忆。这种方法使系统能够延续过去的决策,并在多代间复用或规避推导信息。策略智能体规划演化动作,世界模型智能体基于这些动作生成启发式推演,评论智能体则提供路由式反思以总结先前步骤的经验,从而将基于LLM的AHD从试错式演化转向通过推理进行状态感知的规划。在多种COPs上的实验表明,PathWise能以更快速度收敛至更优启发式,在不同LLM骨干模型间具备良好泛化能力,并可扩展至更大规模问题。