A scoring system is a linear classifier composed of a small number of explanatory variables, each assigned a small integer coefficient. This system is highly interpretable and allows predictions to be made with simple manual calculations without the need for a calculator. Several previous studies have used mixed-integer optimization (MIO) techniques to develop scoring systems for binary classification; however, they have not focused on directly maximizing AUC (i.e., area under the receiver operating characteristic curve), even though AUC is recognized as an essential evaluation metric for scoring systems. Our goal herein is to establish an effective MIO framework for constructing scoring systems that directly maximize the buffered AUC (bAUC) as the tightest concave lower bound on AUC. Our optimization model is formulated as a mixed-integer linear optimization (MILO) problem that maximizes bAUC subject to a group sparsity constraint for limiting the number of questions in the scoring system. Computational experiments using publicly available real-world datasets demonstrate that our MILO method can build scoring systems with superior AUC values compared to the baseline methods based on regularization and stepwise regression. This research contributes to the advancement of MIO techniques for developing highly interpretable classification models.
翻译:评分系统是一种由少量解释变量组成的线性分类器,每个变量被赋予较小的整数系数。该系统具有高度可解释性,且无需计算器即可通过简单的手动计算进行预测。先前已有若干研究采用混合整数优化(MIO)技术开发用于二分类的评分系统;然而,这些研究并未聚焦于直接最大化AUC(即受试者工作特征曲线下面积),尽管AUC被公认为评分系统的关键评估指标。本文的目标是建立一个有效的MIO框架,用于构建直接最大化缓冲AUC(bAUC)的评分系统——bAUC是AUC最紧的凹下界。我们的优化模型被表述为一个混合整数线性优化(MILO)问题,该问题在评分系统问题数量受限的群稀疏约束条件下最大化bAUC。利用公开真实数据集的计算实验表明,与基于正则化和逐步回归的基准方法相比,我们的MILO方法能够构建具有更优AUC值的评分系统。本研究有助于推动开发高可解释性分类模型的MIO技术发展。