The Effect of Spoken Language on Speech Enhancement using Self-Supervised Speech Representation Loss Functions

Recent work in the field of speech enhancement (SE) has involved the use of self-supervised speech representations (SSSRs) as feature transformations in loss functions. However, in prior work, very little attention has been paid to the relationship between the language of the audio used to train the self-supervised representation and that used to train the SE system. Enhancement models trained using a loss function which incorporates a self-supervised representation that shares exactly the language of the noisy data used to train the SE system show better performance than those which do not match exactly. This may lead to enhancement systems which are language specific and as such do not generalise well to unseen languages, unlike models trained using traditional spectrogram or time domain loss functions. In this work, SE models are trained and tested on a number of different languages, with self-supervised representations which themselves are trained using different language combinations and with differing network structures as loss function representations. These models are then tested across unseen languages and their performances are analysed. It is found that the training language of the self-supervised representation appears to have a minor effect on enhancement performance, the amount of training data of a particular language, however, greatly affects performance.

翻译：近年来，语音增强领域的研究涉及使用自监督语音表征作为损失函数中的特征变换。然而，先前的工作很少关注用于训练自监督表征的音频语言与用于训练语音增强系统的音频语言之间的关系。当增强模型采用与训练语音增强系统所使用的带噪数据语言完全匹配的自监督表征作为损失函数时，其性能优于语言不完全匹配的模型。这可能导致增强系统具有语言特异性，从而无法像使用传统语谱图或时域损失函数训练的模型那样，很好地泛化到未见过的语言。在本研究中，语音增强模型使用多种不同语言进行训练和测试，所使用的自监督表征分别基于不同的语言组合和网络结构作为损失函数表征进行训练。随后，这些模型在未见过的语言上进行测试，并分析其性能。结果表明，自监督表征的训练语言对增强性能的影响较小，而特定语言的训练数据量则显著影响性能。

相关内容

损失函数（机器学习）

关注 10

损失函数，在AI中亦称呼距离函数，度量函数。此处的距离代表的是抽象性的，代表真实数据与预测数据之间的误差。损失函数（loss function）是用来估量你模型的预测值f(x)与真实值Y的不一致程度，它是一个非负实值函数,通常使用L(Y, f(x))来表示，损失函数越小，模型的鲁棒性就越好。损失函数是经验风险函数的核心部分，也是结构风险函数重要组成部分。

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日