For extracting a target speaker voice, direction-of-arrival (DOA) estimation is crucial for binaural hearing aids operating in noisy, multi-speaker environments. Among the solutions developed for this task, a deep learning convolutional recurrent neural network (CRNN) model leveraging spectral phase differences and magnitude ratios between microphone signals is a popular option. In this paper, we explore adding source-count information for multi-sources DOA estimation. The use of dual-task training with joint multi-sources DOA estimation and source counting is first considered. We then consider using the source count as an auxiliary feature in a standalone DOA estimation system, where the number of active sources (0, 1, or 2+) is integrated into the CRNN architecture through early, mid, and late fusion strategies. Experiments using real binaural recordings are performed. Results show that the dual-task training does not improve DOA estimation performance, although it benefits source-count prediction. However, a ground-truth (oracle) source count used as an auxiliary feature significantly enhances standalone DOA estimation performance, with late fusion yielding up to 14% higher average F1-scores over the baseline CRNN. This highlights the potential of using source-count estimation for robust DOA estimation in binaural hearing aids.
翻译:在嘈杂的多说话人环境中,提取目标说话人声音时,到达方向(DOA)估计对于双耳助听器至关重要。针对该任务开发的解决方案中,利用麦克风信号间频谱相位差与幅度比的深度学习卷积循环神经网络(CRNN)模型已成为主流选择。本文探索了增加声源数量信息用于多声源DOA估计的方法。首先考虑联合多声源DOA估计与声源计数的双任务训练机制,进而将声源计数作为独立DOA估计系统中的辅助特征,通过早期融合、中期融合和晚期融合三种策略,将活动声源数量(0、1或2+)整合至CRNN架构中。基于真实双耳录音的实验表明:双任务训练虽能改善声源计数预测性能,却未提升DOA估计精度;然而,将真实(理想)声源计数作为辅助特征可显著增强独立DOA估计性能,其中晚期融合策略相较于基线CRNN模型平均F1分数提升高达14%。这凸显了在双耳助听器中利用声源计数估计实现鲁棒DOA估计的潜力。