From Self-Supervised Speech Models to Mixture-of-Experts for Robust Anti-Spoofing

Recent advances in speech generation have significantly improved the naturalness of synthetic speech, making spoofing detection increasingly challenging. A key limitation of current anti-spoofing systems is their limited robustness to unseen synthesis methods. In this work, we transform a self-supervised speech representation model into a Mixture-of-Experts (MoE) architecture to improve generalization. Feed-forward blocks in selected encoder layers are replaced by multiple expert networks controlled by a layer-wise gating mechanism, allowing experts to capture complementary acoustic patterns while preserving the representations learned during self-supervised pretraining. We further analyze the architectural choices affecting the performance of this MoE conversion and investigate the activation behavior of the experts. The proposed approach is evaluated on 14 spoofing datasets and reduces the macro EER from 5.46% to 4.81%, corresponding to 11.9% relative improvement over the baseline.

翻译：近期语音生成技术的进展显著提升了合成语音的自然度，使得欺骗检测愈发困难。当前反欺骗系统的主要局限性在于对未知合成方法的鲁棒性不足。本研究将自监督语音表征模型转化为专家混合架构以提升泛化能力。通过在选定编码器层中用受层级门控机制控制的多个专家网络替代前馈模块，使得专家在保留自监督预训练表征的同时，能够捕获互补的声学模式。我们进一步分析了影响该专家混合转化性能的架构选择，并研究了专家的激活行为。所提方法在14个欺骗数据集上的评估中，将宏平均等错误率从5.46%降至4.81%，相较基线实现11.9%的相对提升。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【普林斯顿博士论文】用于语音的生成式通用模型

专知会员服务

19+阅读 · 2025年12月3日

混合专家模型简述

专知会员服务

18+阅读 · 2025年5月30日

《混合专家模型推理优化技术综述》

专知会员服务

46+阅读 · 2024年12月21日

大型语言模型对齐技术综述：RLHF、RLAIF、PPO、DPO 等

专知会员服务

55+阅读 · 2024年7月24日