Transferable XAI: Relating Understanding Across Domains with Explanation Transfer

Current Explainable AI (XAI) focuses on explaining a single application, but when encountering related applications, users may rely on their prior understanding from previous explanations. This leads to either overgeneralization and AI overreliance, or burdensome independent memorization. Indeed, related decision tasks can share explanatory factors, but with some notable differences; e.g., body mass index (BMI) affects the risks for heart disease and diabetes at the same rate, but chest pain is more indicative of heart disease. Similarly, models using different attributes for the same task still share signals; e.g., temperature and pressure affect air pollution but in opposite directions due to the ideal gas law. Leveraging transfer of learning, we propose Transferable XAI to enable users to transfer understanding across related domains by explaining the relationship between domain explanations using a general affine transformation framework applied to linear factor explanations. The framework supports explanation transfer across various domain types: translation for data subspace (subsuming prior work on Incremental XAI), scaling for decision task, and mapping for attributes. Focusing on task and attributes domain types, in formative and summative user studies, we investigated how well participants could understand AI decisions from one domain to another. Compared to single-domain and domain-independent explanations, Transferable XAI was the most helpful for understanding the second domain, leading to the best decision faithfulness, factor recall, and ability to relate explanations between domains. This framework contributes to improving the reusability of explanations across related AI applications by explaining factor relationships between subspaces, tasks, and attributes.

翻译：当前的可解释人工智能（XAI）主要聚焦于解释单一应用场景，但当用户遇到相关应用时，他们往往会依赖先前解释所形成的理解。这可能导致过度泛化与对人工智能的过度依赖，或造成繁重的独立记忆负担。事实上，相关的决策任务可能共享解释性因子，但存在显著差异；例如，身体质量指数（BMI）以相同速率影响心脏病和糖尿病的患病风险，而胸痛症状对心脏病的指示性更强。类似地，使用不同属性完成相同任务的模型仍共享信号；例如，温度与压力皆影响空气污染程度，但基于理想气体定律，二者作用方向相反。借鉴迁移学习思想，我们提出可迁移可解释人工智能，通过建立领域解释间的关系来帮助用户在相关领域间迁移理解——该方法采用通用的仿射变换框架处理线性因子解释。该框架支持多种领域类型的解释迁移：面向数据子空间的平移变换（涵盖增量可解释人工智能的已有研究）、面向决策任务的尺度变换，以及面向属性空间的映射变换。聚焦任务与属性两类领域，我们通过形成性与总结性用户研究，探究参与者将人工智能决策理解从一个领域迁移至另一领域的效果。相较于单领域解释和领域无关解释，可迁移可解释人工智能最能帮助用户理解第二领域，在决策忠实度、因子记忆效果及跨领域解释关联能力方面均表现最优。该框架通过解释子空间、任务与属性间的因子关联关系，为提升可解释人工智能在相关应用间的可复用性提供了新途径。