Automatic heuristic design (AHD) has emerged as a promising paradigm for solving NP-hard combinatorial optimization problems (COPs). Recent works show that large language models (LLMs), when integrated into well-designed frameworks (i.e., LLM-AHD), can autonomously discover high-performing heuristics. However, existing LLM-AHD frameworks typically treat LLMs as passive generators within fixed workflows, where the model generates heuristics from manually designed, limited context. Such context may fail to capture state-dependent information (e.g., specific failure modes), leading to inefficient trial-and-error exploration. To overcome these limitations, we propose AHD Agent, a novel tool-integrated, multi-turn framework that empowers LLMs to proactively decide whether to generate heuristics or invoke tools to retrieve targeted evidence from the solving environment. To effectively train such a dynamic decision-making agent, we introduce an agentic reinforcement learning (RL) system, which leverages a novel environment synthesis pipeline to optimize a compact model's generalizable AHD capabilities. Experiments across eight diverse domains, including four held-out tasks, demonstrate that our 4B-parameter agent matches or surpasses state-of-the-art baselines using much larger models, while requiring significantly fewer evaluations. Model and inference scaling analysis further reveals that AHD Agent offers an effective trajectory toward truly autonomous heuristic design.
翻译:自动启发式设计(AHD)已成为求解NP难组合优化问题(COPs)的一种有前景的范式。近期研究表明,将大语言模型(LLMs)集成到设计良好的框架中(即LLM-AHD),可以自主发现高性能的启发式策略。然而,现有的LLM-AHD框架通常将LLM视为固定工作流中的被动生成器,模型基于人工设计的有限上下文生成启发式策略。此类上下文可能无法捕捉状态依赖信息(例如特定的故障模式),导致低效的试错探索。为克服这些局限,我们提出AHD Agent——一种新颖的、集成工具的多轮框架,使LLM能够主动决定是生成启发式策略,还是调用工具从求解环境中检索针对性证据。为了有效训练这种动态决策智能体,我们引入了一种智能体强化学习(RL)系统,该系统利用新颖的环境合成流水线来优化紧凑模型的通用AHD能力。在包括四个保留任务在内的八个不同领域的实验表明,我们的4B参数智能体在性能上匹敌或超越了使用更大模型的最先进基线,同时所需的评估次数大幅减少。模型和推理规模的缩放分析进一步揭示,AHD Agent为实现真正的自主启发式设计提供了一条有效路径。