The advent of deep neural networks (DNN) has significantly improved the performance of monaural speech enhancement (SE). Most of them attempt to implicitly capture the structural features of speech through distribution approximation. However, existing methods are susceptible to the issues of degraded speech and residual noise. This letter is grounded in the Information Bottleneck as an anchor to rethink the SE. By defining the incremental convergence of mutual information between speech characteristics, we elucidate that the acoustic characteristic of speech is crucial in alleviating the above issues, for its explicit introduction contributes to further approximating the optimal information-theoretic upper bound of the optimization. Referring to the chain rule of entropy, we also propose a framework to reconstruct the information composition of the optimization objective, aiming to integrate and refine this underlying characteristic without loss of generality. The visualization reflects consistency with analysis using information theory. Experimental results show that with only 1.18 M additional parameters, the refined CRN has yielded substantial progress over a number of advanced methods. The source code is available at https://github.com/caoruitju/RUI_SE.
翻译:深度神经网络(DNN)的出现显著提升了单声道语音增强(SE)的性能。大多数方法试图通过分布逼近隐式地捕捉语音的结构特征。然而,现有方法易受退化语音和残余噪声问题的影响。本文以信息瓶颈为锚点重新审视语音增强问题。通过定义语音特征间互信息的增量收敛性,我们阐明语音声学特征对于缓解上述问题至关重要,因其显式引入有助于进一步逼近优化问题的信息论最优上界。基于熵的链式法则,我们提出了一种重构优化目标信息构成的框架,旨在不损失泛化性的前提下整合并精炼这一底层特征。可视化结果与信息论分析展现出高度一致性。实验结果表明,仅需增加1.18M参数,经精炼的CRN相较于多种先进方法取得了显著进展。源代码已开源至https://github.com/caoruitju/RUI_SE。