Recent speech enhancement methods based on convolutional neural networks (CNNs) and transformer have been demonstrated to efficaciously capture time-frequency (T-F) information on spectrogram. However, the correlation of each channels of speech features is failed to explore. Theoretically, each channel map of speech features obtained by different convolution kernels contains information with different scales demonstrating strong correlations. To fill this gap, we propose a novel dual-branch architecture named channel-aware dual-branch conformer (CADB-Conformer), which effectively explores the long range time and frequency correlations among different channels, respectively, to extract channel relation aware time-frequency information. Ablation studies conducted on DNS-Challenge 2020 dataset demonstrate the importance of channel feature leveraging while showing the significance of channel relation aware T-F information for speech enhancement. Extensive experiments also show that the proposed model achieves superior performance than recent methods with an attractive computational costs.
翻译:近年来,基于卷积神经网络(CNN)和Transformer的语音增强方法已被证明能有效捕捉语谱图的时频(T-F)信息。然而,这些方法未能充分探索语音特征各通道间的相关性。理论上,通过不同卷积核获得的语音特征各通道图包含具有不同尺度的信息,这些信息间存在强相关性。为填补这一空白,我们提出一种新颖的双分支架构——通道感知双分支Conformer(CADB-Conformer),该架构分别有效探索不同通道间的长程时间与频率相关性,以提取通道关系感知的时频信息。在DNS-Challenge 2020数据集上进行的消融实验证明了利用通道特征的重要性,同时揭示了通道关系感知的时频信息对语音增强的关键作用。大量实验还表明,所提模型以具有吸引力的计算成本实现了优于现有方法的性能。