The use of eXplainable Artificial Intelligence (XAI) systems has introduced a set of challenges that need resolution. Herein, we focus on how to correctly select an XAI method, an open questions within the field. The inherent difficulty of this task is due to the lack of a ground truth. Several authors have proposed metrics to approximate the fidelity of different XAI methods. These metrics lack verification and have concerning disagreements. In this study, we proposed a novel methodology to verify fidelity metrics, using a well-known transparent model, namely a decision tree. This model allowed us to obtain explanations with perfect fidelity. Our proposal constitutes the first objective benchmark for these metrics, facilitating a comparison of existing proposals, and surpassing existing methods. We applied our benchmark to assess the existing fidelity metrics in two different experiments, each using public datasets comprising 52,000 images. The images from these datasets had a size a 128 by 128 pixels and were synthetic data that simplified the training process. All metric values, indicated a lack of fidelity, with the best one showing a 30 \% deviation from the expected values for perfect explanation. Our experimentation led us to conclude that the current fidelity metrics are not reliable enough to be used in real scenarios. From this finding, we deemed it necessary to development new metrics, to avoid the detected problems, and we recommend the usage of our proposal as a benchmark within the scientific community to address these limitations.
翻译:可解释人工智能(XAI)系统的应用引入了一系列亟待解决的问题。本文聚焦于如何正确选择XAI方法这一领域内的开放性难题。该任务的固有挑战源于缺乏真实基准。多位研究者提出了近似度量不同XAI方法保真度的指标,但这些指标缺乏验证且存在关键性分歧。本研究提出了一种新型方法论来验证保真度指标,该方法采用已知的透明模型——决策树,使得我们能够获取具有完美保真度的解释。本方案为这些指标建立了首个客观基准,既促进了现有方案的比较,又超越了现有方法。我们通过两个独立实验评估了现有保真度指标,每个实验均使用包含52,000张图像的公开数据集。这些数据集中的图像为128×128像素的合成数据,简化了训练过程。所有指标值均表明存在保真度缺失,其中最优指标与完美解释的期望值仍有30%的偏差。实验结论表明,当前保真度指标不足以可靠应用于真实场景。基于此发现,我们认为有必要开发新指标以避免已检测问题,并建议科学界采用本方案作为基准来解决这些局限性。