This paper describes the UZH-CL system submitted to the SASV section of the WildSpoof 2026 challenge. The challenge focuses on the integrated defense against generative spoofing attacks by requiring the simultaneous verification of speaker identity and audio authenticity. We proposed a cascaded Spoofing-Aware Speaker Verification framework that integrates a Wavelet Prompt-Tuned XLSR-AASIST countermeasure with a multi-model ensemble. The ASV component utilizes the ResNet34, ResNet293, and WavLM-ECAPA-TDNN architectures, with Z-score normalization followed by score averaging. Trained on VoxCeleb2 and SpoofCeleb, the system obtained a Macro a-DCF of 0.2017 and a SASV EER of 2.08%. While the system achieved a 0.16% EER in spoof detection on the in-domain data, results on unseen datasets, such as the ASVspoof5, highlight the critical challenge of cross-domain generalization.
翻译:本文介绍了提交至WildSpoof 2026挑战赛SASV赛道的UZH-CL系统。该挑战赛要求同时验证说话人身份与音频真实性,旨在实现对生成式欺骗攻击的综合防御。我们提出了一种级联式防欺骗说话人验证框架,该框架集成了基于小波提示调谐的XLSR-AASIST对抗模块与多模型集成策略。说话人验证组件采用ResNet34、ResNet293及WavLM-ECAPA-TDNN架构,并经过Z分数归一化与分数平均处理。系统在VoxCeleb2与SpoofCeleb数据集上进行训练,最终获得0.2017的宏a-DCF与2.08%的SASV等错误率。尽管系统在领域内数据上实现了0.16%的欺骗检测等错误率,但在ASVspoof5等未见数据集上的结果凸显了跨领域泛化这一关键挑战。