Deepfake speech detectors often output a single score without explaining why an audio sample is flagged, where in the signal the evidence lies, or what cues drive the decision. We propose an audio-native explainability pipeline using Integrated Gradients on time-aligned self-supervised representations to localize decision evidence over time. We apply the proposed method to three WavLM-based detectors (AASIST, CA-MHFA, SLS) on ASVspoof 5 and manually annotate the highest-attribution regions to provide a semantic meaning of the most important cues. Despite similar performance, the detectors rely on different cues: AASIST emphasizes non-speech/environment cues, CA-MHFA focuses on localized phoneme artifacts, and SLS relies on word boundaries and spectral integrity. We move beyond speculative reasoning and validate our findings by causal masking of the primary detector cues. Observed performance degradation further supports the explained detector semantics.
翻译:深度伪造语音检测器通常只输出一个评分,却未解释为何将某段音频标记为伪造、证据位于信号的何处、或何种线索驱动了决策。为此,我们提出了一种基于音频的可解释性管线,利用时间对齐的自监督表示上的积分梯度方法,在时间维度上定位决策证据。我们将该方法应用于ASVspoof 5上的三个基于WavLM的检测器(AASIST、CA-MHFA、SLS),并人工标注了最高归因区域,以提供最重要线索的语义意义。尽管性能相似,这些检测器依赖不同的线索:AASIST强调非语音/环境线索,CA-MHFA聚焦于局部音素伪影,而SLS则依赖词边界与频谱完整性。我们超越了推测性推理,通过对检测器主要线索进行因果掩码来验证发现,观察到的性能下降进一步支持了所解释的检测器语义。