The wav2vec 2.0 and integrated spectro-temporal graph attention network (AASIST) based countermeasure achieves great performance in speech anti-spoofing. However, current spoof speech detection systems have fixed training and evaluation durations, while the performance degrades significantly during short utterance evaluation. To solve this problem, AASIST can be improved to AASIST2 by modifying the residual blocks to Res2Net blocks. The modified Res2Net blocks can extract multi-scale features and improve the detection performance for speech of different durations, thus improving the short utterance evaluation performance. On the other hand, adaptive large margin fine-tuning (ALMFT) has achieved performance improvement in short utterance speaker verification. Therefore, we apply Dynamic Chunk Size (DCS) and ALMFT training strategies in speech anti-spoofing to further improve the performance of short utterance evaluation. Experiments demonstrate that the proposed AASIST2 improves the performance of short utterance evaluation while maintaining the performance of regular evaluation on different datasets.
翻译:基于wav2vec 2.0与集成频谱-时间图注意力网络(AASIST)的对抗措施在语音防欺骗中表现优异。然而,当前的欺骗语音检测系统采用固定时长训练和评估,在短语音评估场景中性能显著下降。为解决此问题,可将AASIST改进为AASIST2,具体通过将残差块替换为Res2Net块实现。修改后的Res2Net块能够提取多尺度特征,提升对不同时长语音的检测性能,从而改善短语音评估效果。另一方面,自适应大边界微调(ALMFT)已在短语音说话人验证中取得性能提升。为此,我们在语音防欺骗中引入动态分块大小(DCS)与ALMFT训练策略,进一步优化短语音评估性能。实验表明,所提出的AASIST2在不同数据集上保持常规评估性能的同时,显著提升了短语音评估效果。