In the past few years, there has been an explosive surge in the use of machine learning (ML) techniques to address combinatorial optimization (CO) problems, especially mixed-integer linear programs (MILPs). Despite the achievements, the limited availability of real-world instances often leads to sub-optimal decisions and biased solver assessments, which motivates a suite of synthetic MILP instance generation techniques. However, existing methods either rely heavily on expert-designed formulations or struggle to capture the rich features of real-world instances. To tackle this problem, we propose G2MILP, which to the best of our knowledge is the first deep generative framework for MILP instances. Specifically, G2MILP represents MILP instances as bipartite graphs, and applies a masked variational autoencoder to iteratively corrupt and replace parts of the original graphs to generate new ones. The appealing feature of G2MILP is that it can learn to generate novel and realistic MILP instances without prior expert-designed formulations, while preserving the structures and computational hardness of real-world datasets, simultaneously. Thus the generated instances can facilitate downstream tasks for enhancing MILP solvers under limited data availability. We design a suite of benchmarks to evaluate the quality of the generated MILP instances. Experiments demonstrate that our method can produce instances that closely resemble real-world datasets in terms of both structures and computational hardness.
翻译:近年来,机器学习技术被广泛应用于解决组合优化问题,尤其是混合整数线性规划问题。尽管取得了显著进展,但实际可行实例的匮乏常导致求解器决策次优与性能评估偏差,这催生了合成MILP实例生成技术的研究。然而,现有方法要么过度依赖专家设计的数学表达式,要么难以捕捉真实实例的丰富特征。针对该问题,我们首次提出面向MILP实例的深度生成框架G2MILP。具体而言,G2MILP将MILP实例表示为二分图,并采用掩码变分自编码器对原始图进行迭代式随机掩码与局部替换以生成新实例。该方法的核心优势在于:无需预定义专家规则即可自主生成兼具真实性与新颖性的MILP实例,同时保持真实数据集的拓扑结构与计算复杂度特性。生成的实例能有效支持有限数据场景下MILP求解器的下游优化任务。我们设计了一套综合评估基准对生成实例的质量进行验证。实验结果表明,该方法生成的实例在结构特征与计算复杂度两方面均高度逼近真实数据集。