In this paper, we present a scheme for extending deep neural network-based multiplicative maskers to deep subband filters for speech restoration in the time-frequency domain. The resulting method can be generically applied to any deep neural network providing masks in the time-frequency domain, while requiring only few more trainable parameters and a computational overhead that is negligible for state-of-the-art neural networks. We demonstrate that the resulting deep subband filtering scheme outperforms multiplicative masking for dereverberation, while leaving the denoising performance virtually the same. We argue that this is because deep subband filtering in the time-frequency domain fits the subband approximation often assumed in the dereverberation literature, whereas multiplicative masking corresponds to the narrowband approximation generally employed for denoising.
翻译:在本文中,我们提出了一种将基于深度神经网络的乘法掩蔽扩展至深度子带滤波的方案,用于时频域内的语音恢复。该方法可通用地应用于任何在时频域提供掩蔽的深度神经网络,且仅需增加少量可训练参数,对现有最先进神经网络而言计算开销可忽略不计。我们证明,所提出的深度子带滤波方案在去混响性能上优于乘法掩蔽,同时去噪性能几乎保持不变。我们认为这是由于时频域中的深度子带滤波符合去混响文献中通常假设的子带近似,而乘法掩蔽则对应于去噪中常用的窄带近似。