Binaural speech enhancement faces a severe trade-off challenge, where state-of-the-art performance is achieved by computationally intensive architectures, while lightweight solutions often come at the cost of significant performance degradation. To bridge this gap, we propose the Global Adaptive Fourier Network (GAF-Net), a lightweight deep complex network that aims to establish a balance between performance and computational efficiency. The GAF-Net architecture consists of three components. First, a dual-feature encoder combining short-time Fourier transform and gammatone features enhances the robustness of acoustic representation. Second, a channel-independent globally adaptive Fourier modulator efficiently captures long-term temporal dependencies while preserving the spatial cues. Finally, a dynamic gating mechanism is implemented to reduce processing artifacts. Experimental results show that GAF-Net achieves competitive performance, particularly in terms of binaural cues (ILD and IPD error) and objective intelligibility (MBSTOI), with fewer parameters and computational cost. These results confirm that GAF-Net provides a feasible way to achieve high-fidelity binaural processing on resource-constrained devices.
翻译:双耳语音增强面临着一个严峻的权衡挑战:最先进的性能通常由计算密集型的架构实现,而轻量级解决方案往往以显著的性能下降为代价。为了弥合这一差距,我们提出了全局自适应傅里叶网络(GAF-Net),这是一种轻量级的深度复数网络,旨在性能与计算效率之间建立平衡。GAF-Net架构由三个组件构成。首先,一个结合短时傅里叶变换和伽马通特征的**双特征编码器**增强了声学表示的鲁棒性。其次,一个**通道独立的全局自适应傅里叶调制器**能够高效捕获长期时间依赖性,同时保持空间线索。最后,实现了一个**动态门控机制**以减少处理伪影。实验结果表明,GAF-Net以更少的参数和计算成本实现了具有竞争力的性能,特别是在双耳线索(ILD和IPD误差)和客观可懂度(MBSTOI)方面。这些结果证实了GAF-Net为在资源受限设备上实现高保真双耳处理提供了一条可行的途径。