In this paper, we present a method that allows to further improve speech enhancement obtained with recently introduced Deep Neural Network (DNN) models. We propose a multi-channel refinement method of time-frequency masks obtained with single-channel DNNs, which consists of an iterative Complex Gaussian Mixture Model (CGMM) based algorithm, followed by optimum spatial filtration. We validate our approach on time-frequency masks estimated with three recent deep learning models, namely DCUnet, DCCRN, and FullSubNet. We show that our method with the proposed mask refinement procedure allows to improve the accuracy of estimated masks, in terms of the Area Under the ROC Curve (AUC) measure, and as a consequence the overall speech quality of the enhanced speech signal, as measured by PESQ improvement, and that the improvement is consistent across all three DNN models.
翻译:本文提出一种方法,可进一步改善利用近期深度神经网络(DNN)模型实现的语音增强效果。我们提出一种多通道时频掩码精细化方法,该掩码由单通道DNN估计得到,其核心是基于迭代复高斯混合模型(CGMM)的算法,并随后进行最优空间滤波。我们在三种近期深度学习模型(即DCUnet、DCCRN和FullSubNet)估计的时频掩码上验证了该方法。研究表明,所提出的掩码优化流程能提升估计掩码的精度(以ROC曲线下面积AUC衡量),进而提升增强语音信号的总体质量(以PESQ改进衡量),且该提升在三种DNN模型上表现一致。