As AI models rapidly evolve, they are frequently released to open repositories, such as HuggingFace. It is essential to perform quality assurance validation on these models before integrating them into the production development lifecycle. In addition to evaluating efficiency in terms of balanced accuracy and computing costs, adversarial attacks are potential threats to the robustness and explainability of AI models. Meanwhile, XAI applies algorithms that approximate inputs to outputs post-hoc to identify the contributing features. Adversarial perturbations may also degrade the utility of XAI explanations that require further investigation. In this paper, we present an integrated process designed for downstream evaluation tasks, including validating AI model accuracy, evaluating robustness with benchmark perturbations, comparing explanation utility, and assessing overhead. We demonstrate an evaluation scenario involving six computer vision models, which include CNN-based, Transformer-based, and hybrid architectures, three types of perturbations, and five XAI methods, resulting in ninety unique combinations. The process reveals the explanation utility among the XAI methods in terms of the identified key areas responding to the adversarial perturbation. The process produces aggregated results that illustrate multiple attributes of each AI model.
翻译:随着AI模型的快速发展,它们被频繁发布到开放仓库(如HuggingFace)中。在将这些模型集成到生产开发生命周期之前,对其进行质量保证验证至关重要。除了评估均衡准确性和计算成本方面的效率外,对抗攻击也是AI模型鲁棒性和可解释性的潜在威胁。同时,XAI应用后验近似输入到输出的算法来识别贡献特征。对抗扰动可能会降低XAI解释的效用,这需要进一步研究。本文提出了一种集成流程,专为下游评估任务设计,包括验证AI模型准确性、使用基准扰动评估鲁棒性、比较解释效用以及评估开销。我们演示了一个评估场景,涉及六个计算机视觉模型(包括基于CNN、基于Transformer和混合架构)、三种扰动类型和五种XAI方法,共计90种独特组合。该流程揭示了各XAI方法在识别对抗扰动响应关键区域方面的解释效用。该流程生成的聚合结果展示了每个AI模型的多个属性。