基于相关性影响比的复杂AI模型可解释性方法 (Explainability of Complex AI Models with Correlation Impact Ratio)

Complex AI systems make better predictions but often lack transparency, limiting trustworthiness, interpretability, and safe deployment. Common post hoc AI explainers, such as LIME, SHAP, HSIC, and SAGE, are model agnostic but are too restricted in one significant regard: they tend to misrank correlated features and require costly perturbations, which do not scale to high dimensional data. We introduce ExCIR (Explainability through Correlation Impact Ratio), a theoretically grounded, simple, and reliable metric for explaining the contribution of input features to model outputs, which remains stable and consistent under noise and sampling variations. We demonstrate that ExCIR captures dependencies arising from correlated features through a lightweight single pass formulation. Experimental evaluations on diverse datasets, including EEG, synthetic vehicular data, Digits, and Cats-Dogs, validate the effectiveness and stability of ExCIR across domains, achieving more interpretable feature explanations than existing methods while remaining computationally efficient. To this end, we further extend ExCIR with an information theoretic foundation that unifies the correlation ratio with Canonical Correlation Analysis under mutual information bounds, enabling multi output and class conditioned explainability at scale.

翻译：复杂AI系统虽能做出更优预测，但往往缺乏透明度，限制了其可信度、可解释性及安全部署。常见的后验AI解释器（如LIME、SHAP、HSIC和SAGE）虽具有模型无关性，但在一个重要方面存在局限：它们容易对相关特征进行错误排序，且需要耗费大量计算资源的扰动操作，难以扩展到高维数据。本文提出ExCIR（基于相关性影响比的可解释性方法），这是一种理论完备、简单可靠的特征贡献度量化指标，用于解释输入特征对模型输出的影响，在噪声和采样变异下仍能保持稳定性和一致性。我们证明ExCIR通过轻量级的单次计算框架即可捕捉相关特征产生的依赖关系。在多种数据集（包括EEG、合成车辆数据、Digits和Cats-Dogs）上的实验评估验证了ExCIR跨领域应用的有效性与稳定性，相比现有方法能提供更具可解释性的特征说明，同时保持计算高效性。为此，我们进一步基于信息论基础扩展ExCIR，将相关性比率与互信息约束下的典型相关分析相统一，从而实现了大规模多输出及类别条件可解释性分析。