On The Compensation Between Magnitude and Phase in Speech Separation

Deep neural network (DNN) based end-to-end optimization in the complex time-frequency (T-F) domain or time domain has shown considerable potential in monaural speech separation. Many recent studies optimize loss functions defined solely in the time or complex domain, without including a loss on magnitude. Although such loss functions typically produce better scores if the evaluation metrics are objective time-domain metrics, they however produce worse scores on speech quality and intelligibility metrics and usually lead to worse speech recognition performance, compared with including a loss on magnitude. While this phenomenon has been experimentally observed by many studies, it is often not accurately explained and there lacks a thorough understanding on its fundamental cause. This paper provides a novel view from the perspective of the implicit compensation between estimated magnitude and phase. Analytical results based on monaural speech separation and robust automatic speech recognition (ASR) tasks in noisy-reverberant conditions support the validity of our view.

翻译：在复杂的时间-频率(T-F)域或时间域中,基于深神经网络(DNN)的端到端优化在音频(T-F)域或时域中显示出相当大的潜力,许多最近的研究都显示,在调音器分离方面有相当大的潜力。许多最近的研究都优化了仅仅在时间或复杂域中界定的损失功能,而没有包括重大损失。虽然如果评价指标是客观的时间-域指标,这种损失功能通常会产生更好的评分,但是在语言质量和智能计量方面却会产生更差的评分,而且通常导致更差的语音识别性表现,而不是包括重大损失。虽然许多研究都实验性地观察到这种现象,但往往没有准确的解释,而且对其根本原因缺乏透彻的理解。本文从估计的音量和阶段之间的隐含补偿角度提供了新的观点。根据调音器分离和在噪音-反响条件下强有力的自动语音识别(ASR)任务得出的分析结果支持我们的观点的有效性。

相关内容

损失函数（机器学习）

关注 10

损失函数，在AI中亦称呼距离函数，度量函数。此处的距离代表的是抽象性的，代表真实数据与预测数据之间的误差。损失函数（loss function）是用来估量你模型的预测值f(x)与真实值Y的不一致程度，它是一个非负实值函数,通常使用L(Y, f(x))来表示，损失函数越小，模型的鲁棒性就越好。损失函数是经验风险函数的核心部分，也是结构风险函数重要组成部分。

专知会员服务

39+阅读 · 2020年11月3日