Attention mechanism has been widely utilized in speech enhancement (SE) because theoretically it can effectively model the inherent connection of signal both in time domain and spectrum domain. Usually, the span of attention is limited in time domain while the attention in frequency domain spans the whole frequency range. In this paper, we notice that the attention over the whole frequency range hampers the inference for full-band SE and possibly leads to excessive residual noise. To alleviate this problem, we introduce local spectral attention (LSA) into full-band SE model by limiting the span of attention. The ablation test on the state-of-the-art (SOTA) full-band SE model reveals that the local frequency attention can effectively improve overall performance. The improved model achieves the best objective score on the full-band VoiceBank+DEMAND set.
翻译:注意力机制在语音增强中已被广泛应用,因为理论上它能有效建模信号在时域和频域中的内在联系。通常,注意力的跨度在时域中受到限制,而在频域中则覆盖整个频率范围。本文注意到,对整个频率范围的注意力会阻碍全频带语音增强的推断,并可能导致过多残余噪声。为解决这一问题,我们通过限制注意力跨度将局部频谱注意力引入全频带语音增强模型。对当前最先进全频带语音增强模型的消融测试表明,局部频率注意力能有效提升整体性能。改进后的模型在全频带VoiceBank+DEMAND数据集上取得了最佳客观评分。