Model explainability is crucial for human users to be able to interpret how a proposed classifier assigns labels to data based on its feature values. We study generalized linear models constructed using sets of feature value rules, which can capture nonlinear dependencies and interactions. An inherent trade-off exists between rule set sparsity and its prediction accuracy. It is computationally expensive to find the right choice of sparsity -- e.g., via cross-validation -- with existing methods. We propose a new formulation to learn an ensemble of rule sets that simultaneously addresses these competing factors. Good generalization is ensured while keeping computational costs low by utilizing distributionally robust optimization. The formulation utilizes column generation to efficiently search the space of rule sets and constructs a sparse ensemble of rule sets, in contrast with techniques like random forests or boosting and their variants. We present theoretical results that motivate and justify the use of our distributionally robust formulation. Extensive numerical experiments establish that our method improves over competing methods -- on a large set of publicly available binary classification problem instances -- with respect to one or more of the following metrics: generalization quality, computational cost, and explainability.
翻译:模型可解释性对于人类用户理解分类器如何根据特征值分配数据标签至关重要。本文研究利用特征值规则集构建的广义线性模型,该类模型能够捕捉非线性依赖与交互作用。规则集稀疏性与预测精度之间存在固有权衡。现有方法通过交叉验证等方式寻找合适的稀疏性选择时计算成本高昂。我们提出一种新框架,通过学习规则集成合来同时应对这些竞争性因素。通过采用分布鲁棒优化,在保持低计算成本的同时确保良好泛化性能。该框架利用列生成技术高效搜索规则集空间,与随机森林或提升方法及其变体不同,构建稀疏的规则集成合。我们给出理论结果来论证分布鲁棒优化框架的合理性。大量数值实验表明,在公开的二分类问题实例集上,本方法在泛化质量、计算成本和可解释性等指标上优于现有方法。