Multi-channel speech enhancement seeks to utilize spatial information to distinguish target speech from interfering signals. While deep learning approaches like the dual-path convolutional recurrent network (DPCRN) have made strides, challenges persist in effectively modeling inter-channel correlations and amalgamating multi-level information. In response, we introduce the Parallel Dual-Path Convolutional Recurrent Network (PDPCRN). This acoustic modeling architecture has two key innovations. First, a parallel design with separate branches extracts complementary features. Second, bi-directional modules enable cross-branch communication. Together, these facilitate diverse representation fusion and enhanced modeling. Experimental validation on TIMIT datasets underscores the prowess of PDPCRN. Notably, against baseline models like the standard DPCRN, PDPCRN not only outperforms in PESQ and STOI metrics but also boasts a leaner computational footprint with reduced parameters.
翻译:多通道语音增强旨在利用空间信息从干扰信号中区分目标语音。尽管诸如双路径卷积循环网络(DPCRN)等深度学习方法已取得进展,但在有效建模通道间相关性以及融合多层次信息方面仍面临挑战。为此,我们提出了并行双路径卷积循环网络(PDPCRN)。该声学建模架构包含两项关键创新:其一,采用独立分支的并行设计以提取互补特征;其二,引入双向模块实现跨分支通信。二者协同促进多样化表示融合并增强建模能力。基于TIMIT数据集的实验验证了PDPCRN的优越性。值得注意的是,与标准DPCRN等基线模型相比,PDPCRN不仅在PESQ和STOI指标上表现更优,同时拥有更精简的计算开销与更少的参数量。