Heuristic algorithms play a vital role in solving combinatorial optimization (CO) problems, yet traditional designs depend heavily on manual expertise and struggle to generalize across diverse instances. We introduce \textbf{HeurAgenix}, a two-stage hyper-heuristic framework powered by large language models (LLMs) that first evolves heuristics and then selects among them automatically. In the heuristic evolution phase, HeurAgenix leverages an LLM to compare seed heuristic solutions with higher-quality solutions and extract reusable evolution strategies. During problem solving, it dynamically picks the most promising heuristic for each problem state, guided by the LLM's perception ability. For flexibility, this selector can be either a state-of-the-art LLM or a fine-tuned lightweight model with lower inference cost. To mitigate the scarcity of reliable supervision caused by CO complexity, we fine-tune the lightweight heuristic selector with a dual-reward mechanism that jointly exploits singals from selection preferences and state perception, enabling robust selection under noisy annotations. Extensive experiments on canonical benchmarks show that HeurAgenix not only outperforms existing LLM-based hyper-heuristics but also matches or exceeds specialized solvers. Code is available at https://github.com/microsoft/HeurAgenix.
翻译:启发式算法在解决组合优化问题中扮演着至关重要的角色,但传统设计严重依赖人工专业知识,且难以在不同实例间泛化。我们提出了 \textbf{HeurAgenix},一个由大语言模型驱动的两阶段超启发式框架,该框架首先生成启发式规则,然后自动从中进行选择。在启发式演化阶段,HeurAgenix 利用 LLM 比较初始启发式解与更高质量的解,并提取可复用的演化策略。在问题求解过程中,它根据 LLM 的感知能力,动态地为每个问题状态选择最有前景的启发式规则。为保持灵活性,该选择器既可以是先进的 LLM,也可以是经过微调、推理成本较低的轻量级模型。为了缓解因组合优化问题复杂性导致的可靠监督信号稀缺问题,我们采用双重奖励机制对轻量级启发式选择器进行微调,该机制联合利用了来自选择偏好和状态感知的信号,从而能够在存在噪声标注的情况下实现鲁棒的选择。在经典基准测试上进行的大量实验表明,HeurAgenix 不仅优于现有的基于 LLM 的超启发式方法,而且达到或超越了专用求解器的性能。代码可在 https://github.com/microsoft/HeurAgenix 获取。