The past decade has witnessed substantial growth of data-driven speech enhancement (SE) techniques thanks to deep learning. While existing approaches have shown impressive performance in some common datasets, most of them are designed only for a single condition (e.g., single-channel, multi-channel, or a fixed sampling frequency) or only consider a single task (e.g., denoising or dereverberation). Currently, there is no universal SE approach that can effectively handle diverse input conditions with a single model. In this paper, we make the first attempt to investigate this line of research. First, we devise a single SE model that is independent of microphone channels, signal lengths, and sampling frequencies. Second, we design a universal SE benchmark by combining existing public corpora with multiple conditions. Our experiments on a wide range of datasets show that the proposed single model can successfully handle diverse conditions with strong performance.
翻译:过去十年,得益于深度学习,数据驱动的语音增强(SE)技术取得了显著发展。现有方法在部分常见数据集上表现出色,但多数仅针对单一条件(如单通道、多通道或固定采样频率)设计,或仅考虑单一任务(如降噪或去混响)。目前尚无能够通过单一模型有效处理多样化输入条件的通用SE方法。本文首次尝试探索这一研究方向。首先,我们设计了一个独立于麦克风通道、信号长度和采样频率的单一SE模型。其次,我们通过整合现有包含多种条件的公开语料库,构建了一个通用SE基准测试集。我们在广泛数据集上的实验表明,所提出的单一模型能够成功应对多样化条件并保持优异性能。