We present BiEAR, a human auditory-inspired adaptive binaural front-end for multi-speaker localisation and distance estimation. Inspired by medial olivocochlear (MOC) feedback in human hearing, BiEAR uses a neural controller to adaptively adjust the frequency selectivity of a binaural auditory filterbank during inference. This yields time-frequency adaptive representations for ears, enabling the model to respond to changing acoustic conditions. We evaluate BiEAR on multi-speaker localisation and distance estimation in anechoic and real-room environments. Results show that the adaptive front-end improves localisation accuracy and robustness to unseen speakers and rooms compared with commonly used fixed binaural front-ends. Visualisation and analysis of learned filter adaptations show that BiEAR emphasises informative frequency bands over time. These findings suggest that adaptive, biologically inspired binaural front-ends can improve machine hearing robustness in complex acoustic scenes.
翻译:我们提出BiEAR,一种受人类听觉启发的自适应双耳前端,用于多说话人定位与距离估计。受人类听觉中内侧橄榄耳蜗反馈机制的启发,BiEAR利用神经控制器在推理过程中自适应调整双耳听觉滤波器组的频率选择性,从而生成双耳时频自适应表示,使模型能够响应变化的声学环境。我们在消声和真实房间环境中评估了BiEAR在多说话人定位和距离估计任务上的性能。结果表明,与常用的固定双耳前端相比,自适应前端在定位精度以及对未见说话人和房间的鲁棒性方面均有提升。对学习到的滤波器自适应过程的可视化与分析显示,BiEAR能随时间变化突显信息丰富的频带。这些发现表明,自适应且受生物启发的双耳前端能够提升机器听觉在复杂声学场景中的鲁棒性。