In Explainable AI (XAI), counterfactual explanations (CEs) are a well-studied method to communicate feature relevance through contrastive reasoning of "what if" to explain AI models' predictions. However, they only focus on important (i.e., relevant) features and largely disregard less important (i.e., irrelevant) ones. Such irrelevant features can be crucial in many applications, especially when users need to ensure that an AI model's decisions are not affected or biased against specific attributes such as gender, race, religion, or political affiliation. To address this gap, the concept of alterfactual explanations (AEs) has been proposed. AEs explore an alternative reality of "no matter what", where irrelevant features are substituted with alternative features (e.g., "republicans" -> "democrats") within the same attribute (e.g., "politics") while maintaining a similar prediction output. This serves to validate whether AI model predictions are influenced by the specified attributes. Despite the promise of AEs, there is a lack of computational approaches to systematically generate them, particularly in the text domain, where creating AEs for AI text classifiers presents unique challenges. This paper addresses this challenge by formulating AE generation as an optimization problem and introducing MoMatterXAI, a novel algorithm that generates AEs for text classification tasks. Our approach achieves high fidelity of up to 95% while preserving context similarity of over 90% across multiple models and datasets. A human study further validates the effectiveness of AEs in explaining AI text classifiers to end users. All codes will be publicly available.
翻译:在可解释人工智能(XAI)领域,反事实解释(CEs)是一种通过“假设”式对比推理来传达特征相关性、解释AI模型预测的成熟方法。然而,该方法仅关注重要(即相关)特征,而基本忽略了较不重要(即无关)的特征。此类无关特征在许多应用中至关重要,特别是当用户需要确保AI模型的决策不受特定属性(如性别、种族、宗教或政治立场)影响或产生偏见时。为弥补这一空白,研究者提出了替代事实解释(AEs)的概念。AEs探索“无论如何”的替代现实场景,即在保持相似预测输出的前提下,将同一属性(如“政治立场”)内的无关特征替换为替代特征(例如“共和党人”→“民主党人”)。这种方法可用于验证AI模型预测是否受指定属性的影响。尽管AEs具有应用前景,但目前缺乏系统生成AEs的计算方法,尤其在文本领域,为AI文本分类器创建AEs面临着独特挑战。本文通过将AE生成构建为优化问题,并提出创新算法NoMatterXAI来应对这一挑战,该算法可为文本分类任务生成AEs。我们的方法在多个模型和数据集上实现了高达95%的保真度,同时保持超过90%的上下文相似性。一项人工研究进一步验证了AEs在向终端用户解释AI文本分类器方面的有效性。所有代码将公开提供。