We propose a novel approach for spoofed speech characterization through explainable probabilistic attribute embeddings. In contrast to high-dimensional raw embeddings extracted from a spoofing countermeasure (CM) whose dimensions are not easy to interpret, the probabilistic attributes are designed to gauge the presence or absence of sub-components that make up a specific spoofing attack. These attributes are then applied to two downstream tasks: spoofing detection and attack attribution. To enforce interpretability also to the back-end, we adopt a decision tree classifier. Our experiments on the ASVspoof2019 dataset with spoof CM embeddings extracted from three models (AASIST, Rawboost-AASIST, SSL-AASIST) suggest that the performance of the attribute embeddings are on par with the original raw spoof CM embeddings for both tasks. The best performance achieved with the proposed approach for spoofing detection and attack attribution, in terms of accuracy, is 99.7% and 99.2%, respectively, compared to 99.7% and 94.7% using the raw CM embeddings. To analyze the relative contribution of each attribute, we estimate their Shapley values. Attributes related to acoustic feature prediction, waveform generation (vocoder), and speaker modeling are found important for spoofing detection; while duration modeling, vocoder, and input type play a role in spoofing attack attribution.
翻译:我们提出了一种通过可解释概率属性嵌入进行伪造语音特征描述的新方法。与从伪造对抗措施(CM)中提取的高维原始嵌入(其维度不易解释)相比,概率属性旨在衡量构成特定伪造攻击的子成分的存在与否。这些属性随后被应用于两个下游任务:伪造检测和攻击归因。为了确保后端也具有可解释性,我们采用了决策树分类器。我们在ASVspoof2019数据集上进行了实验,使用了从三个模型(AASIST、Rawboost-AASIST、SSL-AASIST)中提取的伪造CM嵌入。实验结果表明,对于这两项任务,属性嵌入的性能与原始伪造CM嵌入相当。所提方法在伪造检测和攻击归因方面达到的最佳准确率分别为99.7%和99.2%,而使用原始CM嵌入的准确率分别为99.7%和94.7%。为了分析每个属性的相对贡献,我们估算了它们的沙普利值。研究发现,与声学特征预测、波形生成(声码器)和说话人建模相关的属性对于伪造检测很重要;而时长建模、声码器和输入类型则在伪造攻击归因中发挥作用。