NoMatterXAI: Generating "No Matter What" Alterfactual Examples for Explaining Black-Box Text Classification Models

In Explainable AI (XAI), counterfactual explanations (CEs) are a well-studied method to communicate feature relevance through contrastive reasoning of "what if" to explain AI models' predictions. However, they only focus on important (i.e., relevant) features and largely disregard less important (i.e., irrelevant) ones. Such irrelevant features can be crucial in many applications, especially when users need to ensure that an AI model's decisions are not affected or biased against specific attributes such as gender, race, religion, or political affiliation. To address this gap, the concept of alterfactual explanations (AEs) has been proposed. AEs explore an alternative reality of "no matter what", where irrelevant features are substituted with alternative features (e.g., "republicans" -> "democrats") within the same attribute (e.g., "politics") while maintaining a similar prediction output. This serves to validate whether AI model predictions are influenced by the specified attributes. Despite the promise of AEs, there is a lack of computational approaches to systematically generate them, particularly in the text domain, where creating AEs for AI text classifiers presents unique challenges. This paper addresses this challenge by formulating AE generation as an optimization problem and introducing MoMatterXAI, a novel algorithm that generates AEs for text classification tasks. Our approach achieves high fidelity of up to 95% while preserving context similarity of over 90% across multiple models and datasets. A human study further validates the effectiveness of AEs in explaining AI text classifiers to end users. All codes will be publicly available.

翻译：在可解释人工智能（XAI）领域，反事实解释（CEs）是一种通过“假设”式对比推理来传达特征相关性、解释AI模型预测的成熟方法。然而，该方法仅关注重要（即相关）特征，而基本忽略了较不重要（即无关）的特征。此类无关特征在许多应用中至关重要，特别是当用户需要确保AI模型的决策不受特定属性（如性别、种族、宗教或政治立场）影响或产生偏见时。为弥补这一空白，研究者提出了替代事实解释（AEs）的概念。AEs探索“无论如何”的替代现实场景，即在保持相似预测输出的前提下，将同一属性（如“政治立场”）内的无关特征替换为替代特征（例如“共和党人”→“民主党人”）。这种方法可用于验证AI模型预测是否受指定属性的影响。尽管AEs具有应用前景，但目前缺乏系统生成AEs的计算方法，尤其在文本领域，为AI文本分类器创建AEs面临着独特挑战。本文通过将AE生成构建为优化问题，并提出创新算法NoMatterXAI来应对这一挑战，该算法可为文本分类任务生成AEs。我们的方法在多个模型和数据集上实现了高达95%的保真度，同时保持超过90%的上下文相似性。一项人工研究进一步验证了AEs在向终端用户解释AI文本分类器方面的有效性。所有代码将公开提供。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日