Large language models (LLMs) have enabled rapid progress in automatic heuristic discovery (AHD), yet most existing methods are predominantly limited by static evaluation against fixed instance distributions, leading to potential overfitting and poor generalization under distributional shifts. We propose Algorithm Space Response Oracles (ASRO), a game-theoretic framework that reframes heuristic discovery as a program level co-evolution between solver and instance generator. ASRO models their interaction as a two-player zero-sum game, maintains growing strategy pools on both sides, and iteratively expands them via LLM-based best-response oracles against mixed opponent meta-strategies, thereby replacing static evaluation with an adaptive, self-generated curriculum. Across multiple combinatorial optimization domains, ASRO consistently outperforms static-training AHD baselines built on the same program search mechanisms, achieving substantially improved generalization and robustness on diverse and out-of-distribution instances.
翻译:大型语言模型(LLM)已推动自动启发式发现(AHD)领域快速发展,但现有方法主要受限于对固定实例分布的静态评估,易导致分布偏移下的过拟合与泛化能力不足。本文提出算法空间响应预言机(ASRO),该博弈论框架将启发式发现重构为求解器与实例生成器在程序层面的协同进化过程。ASRO将二者交互建模为双人零和博弈,在两侧维护持续增长的策略池,并通过基于LLM的最佳响应预言机针对混合对手元策略进行迭代扩展,从而以自适应、自生成的课程学习替代静态评估。在多个组合优化领域的实验中,ASRO在相同程序搜索机制构建的静态训练AHD基线模型上均取得显著优势,在多样化及分布外实例上实现了大幅提升的泛化能力与鲁棒性。