Binaural speech enhancement (BSE) aims to jointly improve the speech quality and intelligibility of noisy signals received by hearing devices and preserve the spatial cues of the target for natural listening. Existing methods often suffer from the compromise between noise reduction (NR) capacity and spatial cues preservation (SCP) accuracy and a high computational demand in complex acoustic scenes. In this work, we present a learning-based lightweight binaural complex convolutional network (LBCCN), which excels in NR by filtering low-frequency bands and keeping the rest. Additionally, our approach explicitly incorporates the estimation of interchannel relative acoustic transfer function to ensure the spatial cues fidelity and speech clarity. Results show that the proposed LBCCN can achieve a comparable NR performance to state-of-the-art methods under various noise conditions, but with a much lower computational cost and a better SCP. The reproducible code and audio examples are available at https://github.com/jywanng/LBCCN.
翻译:双耳语音增强(BSE)旨在联合提升听力设备接收的含噪信号的语音质量与可懂度,并保留目标声源的空间线索以实现自然聆听。现有方法通常在降噪(NR)能力与空间线索保持(SCP)精度之间存在折衷,且在复杂声学场景中计算需求较高。本文提出一种基于学习的轻量级双耳复数卷积网络(LBCCN),该网络通过滤除低频带并保留其余频带,在降噪方面表现优异。此外,我们的方法显式地结合了通道间相对声学传递函数的估计,以确保空间线索的保真度与语音清晰度。实验结果表明,所提出的LBCCN在多种噪声条件下能达到与先进方法相当的降噪性能,同时计算成本显著降低,且空间线索保持效果更优。可复现代码与音频示例已发布于 https://github.com/jywanng/LBCCN。