The aim of this project is to implement and design arobust synthetic speech classifier for the IEEE Signal ProcessingCup 2022 challenge. Here, we learn a synthetic speech attributionmodel using the speech generated from various text-to-speech(TTS) algorithms as well as unknown TTS algorithms. Weexperiment with both the classical machine learning methodssuch as support vector machine, Gaussian mixture model, anddeep learning based methods such as ResNet, VGG16, and twoshallow end-to-end networks. We observe that deep learningbased methods with raw data demonstrate the best performance.
翻译:本项目旨在为IEEE信号处理杯2022挑战赛实现并设计一个鲁棒的合成语音分类器。在此,我们使用由多种文本转语音算法以及未知TTS算法生成的语音,学习一个合成语音溯源模型。我们尝试了经典机器学习方法(如支持向量机、高斯混合模型)以及基于深度学习的方法(如ResNet、VGG16和两种浅层端到端网络)。实验结果表明,采用原始数据的深度学习方法表现出最佳性能。