For many use-cases, it is often important to explain the prediction of a black-box model by identifying the most influential training data samples. Existing approaches lack customization for user intent and often provide a homogeneous set of explanation samples, failing to reveal the model's reasoning from different angles. In this paper, we propose AIDE, an approach for providing antithetical (i.e., contrastive), intent-based, diverse explanations for opaque and complex models. AIDE distinguishes three types of explainability intents: interpreting a correct, investigating a wrong, and clarifying an ambiguous prediction. For each intent, AIDE selects an appropriate set of influential training samples that support or oppose the prediction either directly or by contrast. To provide a succinct summary, AIDE uses diversity-aware sampling to avoid redundancy and increase coverage of the training data. We demonstrate the effectiveness of AIDE on image and text classification tasks, in three ways: quantitatively, assessing correctness and continuity; qualitatively, comparing anecdotal evidence from AIDE and other example-based approaches; and via a user study, evaluating multiple aspects of AIDE. The results show that AIDE addresses the limitations of existing methods and exhibits desirable traits for an explainability method.
翻译:在许多应用场景中,通过识别最具影响力的训练数据样本来解释黑盒模型的预测结果至关重要。现有方法缺乏对用户意图的定制化能力,且通常提供同质化的解释样本集,无法从不同角度揭示模型的推理过程。本文提出AIDE方法,为不透明复杂模型提供对立性(即对比性)、基于意图且多样化的解释。AIDE区分三种可解释性意图:解释正确预测、调查错误预测、澄清模糊预测。针对每种意图,AIDE选择一组恰当的有影响力训练样本,这些样本通过直接支持或对比反对的方式与预测结果形成关联。为提供简洁的摘要,AIDE采用多样性感知采样以避免冗余并提高训练数据的覆盖度。我们在图像和文本分类任务中通过三种方式验证AIDE的有效性:定量评估正确性与连续性;定性比较AIDE与其他基于示例方法的实证案例;以及通过用户研究评估AIDE的多方面特性。结果表明,AIDE能够解决现有方法的局限性,并展现出理想的可解释性方法特质。