The expectation to deploy a universal neural network for speech enhancement, with the aim of improving noise robustness across diverse speech processing tasks, faces challenges due to the existing lack of awareness within static speech enhancement frameworks regarding the expected speech in downstream modules. These limitations impede the effectiveness of static speech enhancement approaches in achieving optimal performance for a range of speech processing tasks, thereby challenging the notion of universal applicability. The fundamental issue in achieving universal speech enhancement lies in effectively informing the speech enhancement module about the features of downstream modules. In this study, we present a novel weighting prediction approach, which explicitly learns the task relationships from downstream training information to address the core challenge of universal speech enhancement. We found the role of deciding whether to employ data augmentation techniques as crucial downstream training information. This decision significantly impacts the expected speech and the performance of the speech enhancement module. Moreover, we introduce a novel speech enhancement network, the Plugin Speech Enhancement (Plugin-SE). The Plugin-SE is a dynamic neural network that includes the speech enhancement module, gate module, and weight prediction module. Experimental results demonstrate that the proposed Plugin-SE approach is competitive or superior to other joint training methods across various downstream tasks.
翻译:旨在提升语音处理任务中噪声鲁棒性的通用神经网络部署期望,因静态语音增强框架缺乏对下游模块预期语音特征的有效感知而面临挑战。这种局限性阻碍了静态语音增强方法在不同语音处理任务中实现最优性能,进而对通用适用性理念构成挑战。实现通用语音增强的核心问题在于如何有效向语音增强模块传递下游模块特征。本研究提出一种新颖的权重预测方法,通过显式学习下游训练信息中的任务关联性来应对通用语音增强的核心挑战。我们发现了数据增强技术决策作为关键下游训练信息的作用——该决策显著影响预期语音特征及语音增强模块性能。进一步地,我们提出新型语音增强网络——插件式语音增强(Plugin-SE)。该网络作为动态神经网络,包含语音增强模块、门控模块与权重预测模块。实验结果表明,所提Plugin-SE方法在多种下游任务中相较于其他联合训练方法具有竞争力或更优性能。