Sound source localization (SSL) technology plays a crucial role in various application areas such as fault diagnosis, speech separation, and vibration noise reduction. Although beamforming algorithms are widely used in SSL, their resolution at low frequencies is limited. In recent years, deep learning-based SSL methods have significantly improved their accuracy by employing large microphone arrays and training case specific neural networks, however, this could lead to narrow applicability. To address these issues, this paper proposes a convolutional neural network-based method for high-precision SSL, which is adaptive in the lower frequency range under 1kHz with varying numbers of sound sources and microphone array-to-scanning grid distances. It takes the pressure distribution on a relatively small microphone array as input to the neural network, and employs customized training labels and loss function to train the model. Prediction accuracy, adaptability and robustness of the trained model under certain signal-to-noise ratio (SNR) are evaluated using randomly generated test datasets, and compared with classical beamforming algorithms, CLEAN-SC and DAMAS. Results of both planar and spatial sound source distributions show that the proposed neural network model significantly improves low-frequency localization accuracy, demonstrating its effectiveness and potential in SSL.
翻译:声源定位技术在故障诊断、语音分离和振动降噪等多个应用领域中发挥着关键作用。尽管波束形成算法在声源定位中被广泛采用,但其在低频段的分辨率存在局限性。近年来,基于深度学习的声源定位方法通过使用大型麦克风阵列和训练针对特定场景的神经网络,显著提升了定位精度,但这可能导致其适用范围受限。为解决上述问题,本文提出一种基于卷积神经网络的高精度声源定位方法,该方法在1kHz以下的低频范围内具有自适应性,能够适应不同声源数量及麦克风阵列与扫描网格间距的变化。该方法以相对小型麦克风阵列上的声压分布作为神经网络输入,采用定制化的训练标签和损失函数进行模型训练。通过随机生成的测试数据集,在特定信噪比条件下评估了训练模型的预测精度、适应性和鲁棒性,并与经典波束形成算法CLEAN-SC和DAMAS进行了对比。平面及空间声源分布的实验结果表明,所提出的神经网络模型显著提升了低频定位精度,证明了其在声源定位中的有效性和应用潜力。