RL-MILP Solver: A Reinforcement Learning Approach for Solving Mixed-Integer Linear Programs with Graph Neural Networks

Mixed-Integer Linear Programming (MILP) is an optimization technique widely used in various fields. Primal heuristics, which reduce the search space of MILP, have enabled traditional solvers (e.g., Gurobi) to efficiently find high-quality solutions. However, traditional primal heuristics rely on expert knowledge, motivating the advent of machine learning (ML)-based primal heuristics that learn repetitive patterns in MILP. Nonetheless, existing ML-based primal heuristics do not guarantee solution feasibility (i.e., satisfying all constraints) and primarily focus on prediction for binary decision variables. When addressing MILP involving non-binary integer variables using ML-based approaches, feasibility issues can become even more pronounced. Since finding an optimal solution requires satisfying all constraints, addressing feasibility is critical. To overcome these limitations, we propose a novel reinforcement learning (RL)-based solver that interacts with MILP to find feasible solutions, rather than delegating sub-problems to traditional solvers. We design reward functions tailored for MILP, which enables the RL agent to learn relationships between decision variables and constraints. Additionally, to effectively model complex relationships among decision variables, we leverage a Transformer encoder-based graph neural network (GNN). Our experimental results demonstrate that the proposed method can solve MILP problems and find near-optimal solutions without delegating the remainder to traditional solvers. The proposed method provides a meaningful step forward as an initial study in solving MILP problems end-to-end based solely on ML.

翻译：混合整数线性规划（MILP）是一种广泛应用于各领域的优化技术。原始启发式方法通过缩减MILP的搜索空间，使传统求解器（如Gurobi）能够高效地找到高质量解。然而，传统原始启发式方法依赖于专家知识，这推动了基于机器学习（ML）的原始启发式方法的发展，后者能够学习MILP中的重复模式。尽管如此，现有基于ML的原始启发式方法无法保证解的可行性（即满足所有约束条件），且主要关注二元决策变量的预测。当使用基于ML的方法处理涉及非二元整数变量的MILP时，可行性问题可能变得更加突出。由于寻找最优解需满足所有约束条件，解决可行性问题至关重要。为克服这些局限，我们提出一种基于强化学习（RL）的新型求解器，通过与MILP直接交互来寻找可行解，而非将子问题委托给传统求解器。我们设计了针对MILP定制的奖励函数，使RL智能体能够学习决策变量与约束条件之间的关系。此外，为有效建模决策变量间的复杂关系，我们采用基于Transformer编码器的图神经网络（GNN）。实验结果表明，所提方法能够在不依赖传统求解器处理剩余问题的情况下，求解MILP问题并找到接近最优的解。该研究作为完全基于ML端到端求解MILP问题的初步探索，迈出了具有重要意义的一步。