This paper investigates the fragility of post-hoc explanation methods in audio deepfake detection. While previous work on explanation manipulation focused on images using standard $L_p$ metrics, we introduce a psychoacoustic framework that optimizes inaudible perturbations to decouple model attributions from final classifications. We evaluate this vulnerability across state-of-the-art architectures under strict prediction-preserving constraints. By evaluating the manipulation cost through domain-specific perceptual audio quality metrics alongside explanation alignment criteria, our framework demonstrates that an adversary can systematically distort automated explanation heatmaps while preserving the predicted deepfake label. Full code available at: https://github.com/cncPomper/Audio-XAI
翻译:本文研究了事后解释方法在音频深度伪造检测中的脆弱性。以往关于解释操纵的研究主要基于标准$L_p$范数度量聚焦于图像领域,我们则引入了一种心理声学框架,通过优化不可感知扰动来解耦模型归因与最终分类。我们在严格保持预测结果不变的约束条件下,评估了该脆弱性在最先进架构中的表现。通过结合领域特定的感知音频质量指标与解释对齐准则来衡量操纵成本,我们的框架证明:攻击者可以在保留预测深度伪造标签的同时,系统性地扭曲自动化解释热力图。完整代码详见:https://github.com/cncPomper/Audio-XAI