Beyond Explaining Predictions: Logic-Based Explanations for Confidence in Machine Learning Models

Machine learning is increasingly used in critical domains, where both predictions and their associated confidence levels influence important decisions. To enhance transparency in such scenarios, it is important to understand why a model is confident or uncertain about its predictions. Recent logic-based approaches provide abductive explanations, minimal subsets of features sufficient to preserve the predicted class, with correctness guarantees. However, these methods focus solely on classification behavior and may produce explanations that cover instances with low predictive confidence. In this work, we introduce the concept of Minimum Confidence Threshold (MCT), which quantifies the weakest confidence guarantee provided by an abductive explanation. Building upon this concept, we propose confidence-aware abductive explanations, which preserve not only the predicted class but also a user-specified confidence guarantee. We formulate MCT computation as an optimization problem and introduce an algorithm for generating minimal explanations that satisfy a desired confidence threshold. We evaluate the proposed framework on boosted trees for binary classification, although the approach is applicable to other machine learning models that provide confidence scores. Experimental results show that traditional abductive explanations often provide substantially weaker confidence guarantees than the confidence associated with the explained instance itself. In contrast, confidence-aware explanations consistently improve the minimum confidence guaranteed by an explanation while requiring only a modest increase in explanation length. These properties make the proposed approach particularly suitable for applications where both predictive correctness and confidence are essential for trustworthy decision making.

翻译：机器学习正越来越多地应用于关键领域，在这些领域中，预测结果及其相关的置信度水平共同影响着重要决策。为了增强此类场景的透明度，理解模型为何对其预测结果表现出自信或不确定性至关重要。近期的逻辑方法虽能提供溯因解释（即足以维持预测类别的最小特征子集）并具备正确性保证，但这些方法仅关注分类行为，可能产生涵盖低预测置信度实例的解释。本研究首次提出最小置信阈值（MCT）概念，用以量化溯因解释所能提供的最弱置信度保证。基于这一概念，我们构建了置信感知型溯因解释，该解释不仅保留预测类别，还满足用户指定的置信度保证。我们将MCT计算建模为优化问题，并提出一种生成满足目标置信阈值的最小解释的算法。本研究以二分类问题中的梯度提升树为评估对象（该方法同样适用于其他提供置信度分数的机器学习模型）。实验结果表明，传统溯因解释往往提供远弱于被解释实例本身置信度的保证。相比之下，置信感知型解释能持续提升解释所保证的最小置信度，且仅需少量增加解释长度。这些特性使得所提方法特别适用于预测正确性与置信度均为可信决策要点的应用场景。