Accelerate Presolve in Large-Scale Linear Programming via Reinforcement Learning

Large-scale LP problems from industry usually contain much redundancy that severely hurts the efficiency and reliability of solving LPs, making presolve (i.e., the problem simplification module) one of the most critical components in modern LP solvers. However, how to design high-quality presolve routines -- that is, the program determining (P1) which presolvers to select, (P2) in what order to execute, and (P3) when to stop -- remains a highly challenging task due to the extensive requirements on expert knowledge and the large search space. Due to the sequential decision property of the task and the lack of expert demonstrations, we propose a simple and efficient reinforcement learning (RL) framework -- namely, reinforcement learning for presolve (RL4Presolve) -- to tackle (P1)-(P3) simultaneously. Specifically, we formulate the routine design task as a Markov decision process and propose an RL framework with adaptive action sequences to generate high-quality presolve routines efficiently. Note that adaptive action sequences help learn complex behaviors efficiently and adapt to various benchmarks. Experiments on two solvers (open-source and commercial) and eight benchmarks (real-world and synthetic) demonstrate that RL4Presolve significantly and consistently improves the efficiency of solving large-scale LPs, especially on benchmarks from industry. Furthermore, we optimize the hard-coded presolve routines in LP solvers by extracting rules from learned policies for simple and efficient deployment to Huawei's supply chain. The results show encouraging economic and academic potential for incorporating machine learning to modern solvers.

翻译：工业界的大规模线性规划问题通常包含大量冗余，严重影响了求解线性规划的效率和可靠性，使得预处理（即问题简化模块）成为现代线性规划求解器中最为关键的组件之一。然而，由于对专家知识的广泛需求以及巨大的搜索空间，如何设计高质量的预处理例程——即决定（P1）选择哪些预处理方法、（P2）以何种顺序执行以及（P3）何时停止的程序——仍然是一项极具挑战性的任务。考虑到该任务的序列决策特性以及专家示范数据的缺乏，我们提出了一种简单高效的强化学习框架——即用于预处理的强化学习（RL4Presolve）——以同时解决（P1）-（P3）问题。具体而言，我们将例程设计任务建模为马尔可夫决策过程，并提出了一种具有自适应动作序列的强化学习框架，以高效生成高质量的预处理例程。值得注意的是，自适应动作序列有助于高效学习复杂行为并适应各种基准测试。在两个求解器（开源和商业）以及八个基准测试（真实世界和合成数据）上的实验表明，RL4Presolve显著且一致地提高了大规模线性规划的求解效率，尤其是在工业基准测试上。此外，我们通过从学习到的策略中提取规则来优化线性规划求解器中的硬编码预处理例程，以实现简单高效地部署到华为的供应链中。结果显示了将机器学习融入现代求解器的经济与学术潜力。