The trustworthiness of Machine Learning (ML) models can be difficult to assess, but is critical in high-risk or ethically sensitive applications. Many models are treated as a `black-box' where the reasoning or criteria for a final decision is opaque to the user. To address this, some existing Explainable AI (XAI) approaches approximate model behaviour using perturbed data. However, such methods have been criticised for ignoring feature dependencies, with explanations being based on potentially unrealistic data. We propose a novel framework, CHILLI, for incorporating data context into XAI by generating contextually aware perturbations, which are faithful to the training data of the base model being explained. This is shown to improve both the soundness and accuracy of the explanations.
翻译:机器学习(ML)模型的可信度往往难以评估,但在高风险或伦理敏感的应用中却至关重要。许多模型被视为“黑箱”,其最终决策的推理或标准对用户而言是不透明的。为解决这一问题,现有的一些可解释人工智能(XAI)方法通过使用扰动数据来近似模型行为。然而,此类方法因忽略特征依赖性而受到批评,其解释可能基于不现实的数据。我们提出了一种新颖的框架CHILLI,通过生成上下文感知的扰动,将数据上下文纳入XAI,这些扰动忠实于待解释基础模型的训练数据。实验表明,该方法能同时提升解释的合理性与准确性。