While deep learning has advanced speech enhancement (SE), effective phase modeling remains challenging, as conventional networks typically operate within a flat Euclidean feature space, which is not easy to model the underlying circular topology of the phase. To address this, we propose a manifold-aware magnitude-phase dual-stream framework that aligns the phase stream with its intrinsic circular geometry by enforcing Global Rotation Equivariance (GRE) characteristic. Specifically, we introduce a Magnitude-Phase Interactive Convolutional Module (MPICM) for modulus-based information exchange and a Hybrid-Attention Dual-FFN (HADF) bottleneck for unified feature fusion, both of which are designed to preserve GRE in the phase stream. Comprehensive evaluations are conducted across phase retrieval, denoising, dereverberation, and bandwidth extension tasks to validate the superiority of the proposed method over multiple advanced baselines. Notably, the proposed architecture reduces Phase Distance by over 20\% in the phase retrieval task and improves PESQ by more than 0.1 in zero-shot cross-corpus denoising evaluations. The overall superiority is also established in universal SE tasks involving mixed distortions. Qualitative analysis further reveals that the learned phase features exhibit distinct periodic patterns, which are consistent with the intrinsic circular nature of the phase. The source code is available at https://github.com/wangchengzhong/RENet.
翻译:尽管深度学习推动了语音增强(SE)的发展,但有效的相位建模仍然具有挑战性,因为传统网络通常在平坦的欧几里得特征空间中操作,这不易于对相位潜在的圆形拓扑结构进行建模。为解决此问题,我们提出了一种流形感知的幅相双流框架,该框架通过强制执行全局旋转等变(GRE)特性,使相位流与其内在的圆形几何结构对齐。具体而言,我们引入了用于基于模长的信息交换的幅相交互卷积模块(MPICM),以及用于统一特征融合的混合注意力双前馈网络(HADF)瓶颈,两者均旨在相位流中保持GRE。我们在相位恢复、去噪、去混响和带宽扩展任务上进行了全面评估,以验证所提方法相对于多个先进基线的优越性。值得注意的是,所提架构在相位恢复任务中将相位距离降低了超过20%,并在零样本跨语料库去噪评估中将PESQ提高了0.1以上。在涉及混合失真的通用SE任务中也确立了整体优势。定性分析进一步表明,学习到的相位特征呈现出明显的周期性模式,这与相位内在的圆形特性一致。源代码可在 https://github.com/wangchengzhong/RENet 获取。