We establish a broad methodological foundation for mixed-integer optimization with learned constraints. We propose an end-to-end pipeline for data-driven decision making in which constraints and objectives are directly learned from data using machine learning, and the trained models are embedded in an optimization formulation. We exploit the mixed-integer optimization-representability of many machine learning methods, including linear models, decision trees, ensembles, and multi-layer perceptrons, which allows us to capture various underlying relationships between decisions, contextual variables, and outcomes. We also introduce two approaches for handling the inherent uncertainty of learning from data. First, we characterize a decision trust region using the convex hull of the observations, to ensure credible recommendations and avoid extrapolation. We efficiently incorporate this representation using column generation and propose a more flexible formulation to deal with low-density regions and high-dimensional datasets. Then, we propose an ensemble learning approach that enforces constraint satisfaction over multiple bootstrapped estimators or multiple algorithms. In combination with domain-driven components, the embedded models and trust region define a mixed-integer optimization problem for prescription generation. We implement this framework as a Python package (OptiCL) for practitioners. We demonstrate the method in both World Food Programme planning and chemotherapy optimization. The case studies illustrate the framework's ability to generate high-quality prescriptions as well as the value added by the trust region, the use of ensembles to control model robustness, the consideration of multiple machine learning methods, and the inclusion of multiple learned constraints.
翻译:我们为带学习约束的混合整数优化建立了广泛的方法论基础。提出了一种端到端的数据驱动决策流程,其中约束和目标通过机器学习直接从数据中学习,并将训练好的模型嵌入优化模型中。我们利用了许多机器学习方法(包括线性模型、决策树、集成方法和多层感知器)的混合整数优化可表示性,从而捕捉决策、上下文变量和结果之间的各种潜在关系。此外,引入了两种方法处理从数据中学习所固有的不确定性。首先,利用观测数据的凸包刻画决策信任区域,以确保推荐的可信度并避免外推。通过列生成方法高效地整合这一表示,并提出了更灵活的公式以处理低密度区域和高维数据集。其次,提出了一种集成学习方法,在多个自助估计器或多种算法上强制执行约束满足。结合领域驱动组件,嵌入的模型和信任区域定义了用于生成处方的混合整数优化问题。我们将该框架实现为Python软件包(OptiCL)供实践者使用。在粮食计划署规划和化疗优化两个案例中验证了该方法。案例研究展示了该框架生成高质量处方的能力,以及信任区域、通过集成方法控制模型鲁棒性、考虑多种机器学习方法以及包含多个学习约束所带来的价值。