Interpretable Machine Learning faces a recurring challenge of explaining the predictions made by opaque classifiers such as ensemble models, kernel methods, or neural networks in terms that are understandable to humans. When the model is viewed as a black box, the objective is to identify a small set of features that jointly determine the black box response with minimal error. However, finding such model-agnostic explanations is computationally demanding, as the problem is intractable even for binary classifiers. In this paper, the task is framed as a Constraint Optimization Problem, where the constraint solver seeks an explanation of minimum error and bounded size for an input data instance and a set of samples generated by the black box. From a theoretical perspective, this constraint programming approach offers PAC-style guarantees for the output explanation. We evaluate the approach empirically on various datasets and show that it statistically outperforms the state-of-the-art heuristic Anchors method.
翻译:可解释机器学习面临一个反复出现的挑战:如何以人类可理解的方式解释由不透明分类器(如集成模型、核方法或神经网络)做出的预测。当模型被视为黑箱时,目标在于识别一小部分特征,这些特征能以最小误差共同决定黑箱的响应。然而,寻找此类模型无关解释在计算上具有挑战性,因为即使对于二分类器,该问题也是难解的。本文将该任务构建为约束优化问题,其中约束求解器针对输入数据实例和黑箱生成的一组样本,寻找误差最小且规模受限的解释。从理论角度看,这种约束规划方法为输出解释提供了PAC式保证。我们在多个数据集上对该方法进行了实证评估,结果表明其在统计意义上优于当前最先进的启发式Anchors方法。