Spatial filters can exploit deep-learning-based speech enhancement models to increase their reliability in scenarios with multiple speech sources scenarios. To further improve speech quality, it is common to perform postfiltering on the estimated target speech obtained with spatial filtering. In this work, Minimum Variance Distortionless Response (MVDR) is employed to provide the interference estimation, along with the estimation of the target speech, to be later used for postfiltering. This improves the enhancement performance over a single-input baseline in a far more significant way than by increasing the model's complexity. Results suggest that less computing resources are required for postfiltering when provided with both target and interference signals, which is a step forward in developing an online speech enhancement system for multi-speech scenarios.
翻译:空间滤波器可利用基于深度学习的语音增强模型,以提升其在多语音源场景下的可靠性。为进一步改善语音质量,通常对经过空间滤波得到的目标语音估计进行后置滤波。本研究采用最小方差无失真响应(MVDR)方法提供干扰估计,同时结合目标语音估计,用于后续后置滤波处理。相较于增加模型复杂度,该方法以更为显著的方式提升了单输入基线的增强性能。结果表明,在同时提供目标信号与干扰信号的情况下,后置滤波所需的计算资源更少,这为开发面向多语音场景的在线语音增强系统迈出了重要一步。