The past decade has witnessed substantial growth of data-driven speech enhancement (SE) techniques thanks to deep learning. While existing approaches have shown impressive performance in some common datasets, most of them are designed only for a single condition (e.g., single-channel, multi-channel, or a fixed sampling frequency) or only consider a single task (e.g., denoising or dereverberation). Currently, there is no universal SE approach that can effectively handle diverse input conditions with a single model. In this paper, we make the first attempt to investigate this line of research. First, we devise a single SE model that is independent of microphone channels, signal lengths, and sampling frequencies. Second, we design a universal SE benchmark by combining existing public corpora with multiple conditions. Our experiments on a wide range of datasets show that the proposed single model can successfully handle diverse conditions with strong performance.
翻译:过去十年间,得益于深度学习技术,数据驱动的语音增强(SE)方法取得了显著进展。尽管现有方法在部分常见数据集上表现优异,但多数方法仅针对单一条件设计(如单通道、多通道或固定采样频率),或仅考虑单一任务(如降噪或去混响)。目前尚不存在能通过单一模型有效处理多样化输入条件的通用语音增强方法。本文首次尝试探索这一研究方向。首先,我们设计了一种独立于麦克风通道数、信号长度和采样频率的单一语音增强模型。其次,通过整合现有含多种条件的公开语料库,构建了通用语音增强基准测试。在广泛数据集上的实验表明,所提出的单一模型能成功处理多样化输入条件并保持强劲性能。