Interpreting Multi-Branch Anti-Spoofing Architectures: Correlating Internal Strategy with Empirical Performance

Multi-branch deep neural networks like AASIST3 achieve state-of-the-art comparable performance in audio anti-spoofing, yet their internal decision dynamics remain opaque compared to traditional input-level saliency methods. While existing interpretability efforts largely focus on visualizing input artifacts, the way individual architectural branches cooperate or compete under different spoofing attacks is not well characterized. This paper develops a framework for interpreting AASIST3 at the component level. Intermediate activations from fourteen branches and global attention modules are modeled with covariance operators whose leading eigenvalues form low-dimensional spectral signatures. These signatures train a CatBoost meta-classifier to generate TreeSHAP-based branch attributions, which we convert into normalized contribution shares and confidence scores (Cb) to quantify the model's operational strategy. By analyzing 13 spoofing attacks from the ASVspoof 2019 benchmark, we identify four operational archetypes-ranging from Effective Specialization (e.g., A09, Equal Error Rate (EER) 0.04%, C=1.56) to Ineffective Consensus (e.g., A08, EER 3.14%, C=0.33). Crucially, our analysis exposes a Flawed Specialization mode where the model places high confidence in an incorrect branch, leading to severe performance degradation for attacks A17 and A18 (EER 14.26% and 28.63%, respectively). These quantitative findings link internal architectural strategy directly to empirical reliability, highlighting specific structural dependencies that standard performance metrics overlook.

翻译：诸如AASIST3等多分支深度神经网络在音频反欺骗任务中取得了最先进的可比性能，然而与传统的输入级显著性方法相比，其内部决策动态仍不透明。现有的可解释性研究主要集中于可视化输入伪影，但各个架构分支在不同欺骗攻击下如何协作或竞争尚未得到充分刻画。本文开发了一个在组件层面解释AASIST3的框架。我们使用协方差算子对来自十四个分支和全局注意力模块的中间激活进行建模，其主导特征值构成低维谱特征。这些特征用于训练一个CatBoost元分类器以生成基于TreeSHAP的分支归因，我们将其转换为归一化的贡献份额和置信度分数（Cb）来量化模型的操作策略。通过分析ASVspoof 2019基准测试中的13种欺骗攻击，我们识别出四种操作原型——从有效专业化（例如A09，等错误率（EER）0.04%，C=1.56）到无效共识（例如A08，EER 3.14%，C=0.33）。至关重要的是，我们的分析揭示了一种有缺陷的专业化模式，即模型对错误分支赋予高置信度，导致针对攻击A17和A18的性能严重下降（EER分别为14.26%和28.63%）。这些定量发现将内部架构策略直接与经验可靠性联系起来，突显了标准性能指标所忽略的特定结构依赖性。