A scoring system is a linear classifier composed of a small number of explanatory variables, each assigned a small integer coefficient. This system is highly interpretable and allows predictions to be made with simple manual calculations without the need for a calculator. Several previous studies have used mixed-integer optimization (MIO) techniques to develop scoring systems for binary classification; however, they have not focused on directly maximizing AUC (i.e., area under the receiver operating characteristic curve), even though AUC is recognized as an essential evaluation metric for scoring systems. Our goal herein is to establish an effective MIO framework for constructing scoring systems that directly maximize the buffered AUC (bAUC) as the tightest concave lower bound on AUC. Our optimization model is formulated as a mixed-integer linear optimization (MILO) problem that maximizes bAUC subject to a group sparsity constraint for limiting the number of questions in the scoring system. Computational experiments using publicly available real-world datasets demonstrate that our MILO method can build scoring systems with superior AUC values compared to the baseline methods based on regularization and stepwise regression. This research contributes to the advancement of MIO techniques for developing highly interpretable classification models.
翻译:评分系统是一种由少量解释变量构成的线性分类器,每个变量被赋予较小的整数系数。该系统具有高度可解释性,且无需计算器即可通过简单的手动计算完成预测。先前已有若干研究采用混合整数优化技术开发二分类评分系统,然而这些研究并未聚焦于直接最大化AUC(即受试者工作特征曲线下面积),尽管AUC被公认为评分系统的核心评估指标。本文旨在建立一个有效的混合整数优化框架,通过直接最大化缓冲AUC(即AUC最紧凹下界)来构建评分系统。我们的优化模型被表述为混合整数线性优化问题,在约束评分系统问题数量的组稀疏条件下最大化缓冲AUC。基于公开真实数据集的数值实验表明,相较于基于正则化和逐步回归的基准方法,我们的混合整数线性优化方法能够构建具有更优AUC值的评分系统。本研究为开发高可解释性分类模型的混合整数优化技术提供了新的推进。