Scalable Explainability-as-a-Service (XaaS) for Edge AI Systems

Though Explainable AI (XAI) has made significant advancements, its inclusion in edge and IoT systems is typically ad-hoc and inefficient. Most current methods are "coupled" in such a way that they generate explanations simultaneously with model inferences. As a result, these approaches incur redundant computation, high latency and poor scalability when deployed across heterogeneous sets of edge devices. In this work we propose Explainability-as-a-Service (XaaS), a distributed architecture for treating explainability as a first-class system service (as opposed to a model-specific feature). The key innovation in our proposed XaaS architecture is that it decouples inference from explanation generation allowing edge devices to request, cache and verify explanations subject to resource and latency constraints. To achieve this, we introduce three main innovations: (1) A distributed explanation cache with a semantic similarity based explanation retrieval method which significantly reduces redundant computation; (2) A lightweight verification protocol that ensures the fidelity of both cached and newly generated explanations; and (3) An adaptive explanation engine that chooses explanation methods based upon device capability and user requirement. We evaluated the performance of XaaS on three real-world edge-AI use cases: (i) manufacturing quality control; (ii) autonomous vehicle perception; and (iii) healthcare diagnostics. Experimental results show that XaaS reduces latency by 38% while maintaining high explanation quality across three real-world deployments. Overall, this work enables the deployment of transparent and accountable AI across large scale, heterogeneous IoT systems, and bridges the gap between XAI research and edge-practicality.

翻译：尽管可解释性人工智能（XAI）取得了显著进展，但其在边缘和物联网系统中的集成通常具有临时性和低效性。当前大多数方法以“耦合”方式运行，即在模型推理的同时生成解释。因此，当这些方法部署在异构边缘设备集上时，会引发冗余计算、高延迟和差扩展性等问题。本文提出可解释性即服务（XaaS）——一种将可解释性作为一级系统服务（而非模型特定功能）的分布式架构。我们提出的XaaS架构核心创新在于将推理与解释生成解耦，使得边缘设备能够根据资源和延迟约束请求、缓存和验证解释。为实现此目标，我们引入三项主要创新：（1）基于语义相似性解释检索方法的分布式解释缓存，显著减少冗余计算；（2）轻量级验证协议，确保缓存和新生成解释的保真度；（3）自适应解释引擎，根据设备能力和用户需求选择解释方法。我们在三个真实边缘AI用例上评估了XaaS性能：（i）制造质量监控；（ii）自动驾驶车辆感知；及（iii）医疗诊断。实验结果表明，XaaS在三个实际部署场景中将延迟降低38%，同时保持高解释质量。总体而言，本工作支持在大规模异构物联网系统中部署透明且可问责的AI，弥合了XAI研究与边缘实用性之间的鸿沟。