GMM-ResNet2: Ensemble of Group ResNet Networks for Synthetic Speech Detection

Deep learning models are widely used for speaker recognition and spoofing speech detection. We propose the GMM-ResNet2 for synthesis speech detection. Compared with the previous GMM-ResNet model, GMM-ResNet2 has four improvements. Firstly, the different order GMMs have different capabilities to form smooth approximations to the feature distribution, and multiple GMMs are used to extract multi-scale Log Gaussian Probability features. Secondly, the grouping technique is used to improve the classification accuracy by exposing the group cardinality while reducing both the number of parameters and the training time. The final score is obtained by ensemble of all group classifier outputs using the averaging method. Thirdly, the residual block is improved by including one activation function and one batch normalization layer. Finally, an ensemble-aware loss function is proposed to integrate the independent loss functions of all ensemble members. On the ASVspoof 2019 LA task, the GMM-ResNet2 achieves a minimum t-DCF of 0.0227 and an EER of 0.79\%. On the ASVspoof 2021 LA task, the GMM-ResNet2 achieves a minimum t-DCF of 0.2362 and an EER of 2.19\%, and represents a relative reductions of 31.4\% and 76.3\% compared with the LFCC-LCNN baseline.

翻译：深度学习模型被广泛应用于说话人识别和欺骗语音检测。我们提出了用于合成语音检测的GMM-ResNet2模型。与先前的GMM-ResNet模型相比，GMM-ResNet2具有四项改进。首先，不同阶数的高斯混合模型（GMM）在形成特征分布的平滑近似方面具有不同能力，我们采用多个GMM来提取多尺度对数高斯概率特征。其次，通过引入分组技术，在暴露组基数的同时降低了参数数量和训练时间，从而提高了分类准确率。最终得分采用平均法集成所有分组分类器的输出获得。第三，通过增加一个激活函数和一个批量归一化层改进了残差块结构。最后，提出了一种集成感知损失函数，用于整合所有集成成员的独立损失函数。在ASVspoof 2019 LA任务中，GMM-ResNet2实现了0.0227的最小t-DCF和0.79%的等错误率（EER）。在ASVspoof 2021 LA任务中，该模型获得了0.2362的最小t-DCF和2.19%的EER，与LFCC-LCNN基线相比分别实现了31.4%和76.3%的相对降低。

相关内容

GROUP

关注 1

Group一直是研究计算机支持的合作工作、人机交互、计算机支持的协作学习和社会技术研究的主要场所。该会议将社会科学、计算机科学、工程、设计、价值观以及其他与小组工作相关的多个不同主题的工作结合起来，并进行了广泛的概念化。官网链接：https://group.acm.org/conferences/group20/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日