Constraint programming is known for being an efficient approach for solving combinatorial problems. Important design choices in a solver are the branching heuristics, which are designed to lead the search to the best solutions in a minimum amount of time. However, developing these heuristics is a time-consuming process that requires problem-specific expertise. This observation has motivated many efforts to use machine learning to automatically learn efficient heuristics without expert intervention. To the best of our knowledge, it is still an open research question. Although several generic variable-selection heuristics are available in the literature, the options for a generic value-selection heuristic are more scarce. In this paper, we propose to tackle this issue by introducing a generic learning procedure that can be used to obtain a value-selection heuristic inside a constraint programming solver. This has been achieved thanks to the combination of a deep Q-learning algorithm, a tailored reward signal, and a heterogeneous graph neural network architecture. Experiments on graph coloring, maximum independent set, and maximum cut problems show that our framework is able to find better solutions close to optimality without requiring a large amounts of backtracks while being generic.
翻译:约束规划以其在组合问题求解中的高效性而闻名。求解器中的关键设计之一是分支启发式方法,其旨在以最短计算时间引导搜索过程找到最优解。然而,开发此类启发式方法需要领域专业知识且耗时费力,这促使研究者们尝试利用机器学习在不依赖专家干预的情况下自动学习高效启发式方法。据我们所知,这仍是一个待解决的研究课题。尽管文献中已存在多种通用变量选择启发式方法,但通用值选择启发式方法的选择极为有限。本文通过引入通用学习流程来解决该问题,该流程可在约束规划求解器内部生成值选择启发式方法。通过结合深度Q学习算法、定制化奖励信号和异构图神经网络架构,我们实现了这一目标。在图着色、最大独立集和最大割问题上的实验表明,我们的框架能够在无需大量回溯的情况下,以通用方式快速找到接近最优解的更优解。