ReLU neural networks trained as surrogate models can be embedded exactly in mixed-integer linear programs (MILPs), enabling global optimization over the learned function. The tractability of the resulting MILP depends on structural properties of the network, i.e., the number of binary variables in associated formulations and the tightness of the continuous LP relaxation. These properties are determined during training, yet standard training objectives (prediction loss with classical weight regularization) offer no mechanism to directly control them. This work studies training regularizers that directly target downstream MILP tractability. Specifically, we propose simple bound-based regularizers that penalize the big-M constants of MILP formulations and/or the number of unstable neurons. Moreover, we introduce an LP relaxation gap regularizer that explicitly penalizes the per-sample gap of the continuous relaxation at training points. We derive its associated gradient and provide an implementation from LP dual variables without custom automatic differentiation tools. We show that combining the above regularizers can approximate the full total derivative of the LP gap with respect to the network parameters, capturing both direct and indirect sensitivities. Experiments on non-convex benchmark functions and a two-stage stochastic programming problem with quantile neural network surrogates demonstrate that the proposed regularizers can reduce MILP solve times by up to four orders of magnitude relative to an unregularized baseline, while maintaining competitive surrogate model accuracy.
翻译:作为代理模型训练的ReLU神经网络可以精确嵌入混合整数线性规划(MILP)中,从而实现对学习函数的全局优化。所得MILP的可解性取决于网络的结构特性,即相关公式中二元变量的数量以及连续线性规划松弛的紧度。这些特性在训练过程中被确定,然而标准训练目标(带有经典权重正则化的预测损失)无法提供直接控制它们的机制。本文研究直接针对下游MILP可解性的训练正则化器。具体而言,我们提出了简单的基于边界的正则化器,用于惩罚MILP公式中的大M常数和/或不稳定神经元的数量。此外,我们引入了一个LP松弛间隙正则化器,该正则化器明确惩罚训练点处连续松弛的每个样本间隙。我们推导了其相关的梯度,并提供了基于LP对偶变量的实现,无需定制自动微分工具。我们表明,结合上述正则化器可以近似LP间隙相对于网络参数的全导数,捕捉直接和间接的敏感性。在非凸基准函数和带有分位数神经网络代理模型的两阶段随机规划问题上的实验表明,与未正则化的基线相比,所提出的正则化器可以将MILP求解时间减少最多四个数量级,同时保持具有竞争力的代理模型精度。