Automated heuristic design (AHD) has gained considerable attention for its potential to automate the development of effective heuristics. The recent advent of large language models (LLMs) has paved a new avenue for AHD, with initial efforts focusing on framing AHD as an evolutionary program search (EPS) problem. However, inconsistent benchmark settings, inadequate baselines, and a lack of detailed component analysis have left the necessity of integrating LLMs with search strategies and the true progress achieved by existing LLM-based EPS methods to be inadequately justified. This work seeks to fulfill these research queries by conducting a large-scale benchmark comprising four LLM-based EPS methods and four AHD problems across nine LLMs and five independent runs. Our extensive experiments yield meaningful insights, providing empirical grounding for the importance of evolutionary search in LLM-based AHD approaches, while also contributing to the advancement of future EPS algorithmic development. To foster accessibility and reproducibility, we have fully open-sourced our benchmark and corresponding results.
翻译:自动启发式设计因其在自动化开发有效启发式方法方面的潜力而受到广泛关注。大语言模型的最新进展为自动启发式设计开辟了新途径,初期研究主要将自动启发式设计构建为进化程序搜索问题。然而,不一致的基准设置、不充分的基线比较以及缺乏详尽的组件分析,使得整合大语言模型与搜索策略的必要性以及现有基于大语言模型的进化程序搜索方法所取得的实际进展未能得到充分论证。本研究通过构建大规模基准测试来解决这些研究问题,该测试涵盖四种基于大语言模型的进化程序搜索方法、四个自动启发式设计问题,并涉及九种大语言模型和五次独立运行实验。我们的大规模实验获得了有意义的发现,为进化搜索在基于大语言模型的自动启发式设计方法中的重要性提供了实证依据,同时也有助于推动未来进化程序搜索算法的发展。为促进可访问性和可复现性,我们已完全开源本研究的基准测试框架及相应结果。