Deep learning-based sound event localization and classification is an emerging research area within wireless acoustic sensor networks. However, current methods for sound event localization and classification typically rely on a single microphone array, making them susceptible to signal attenuation and environmental noise, which limits their monitoring range. Moreover, methods using multiple microphone arrays often focus solely on source localization, neglecting the aspect of sound event classification. In this paper, we propose a deep learning-based method that employs multiple features and attention mechanisms to estimate the location and class of sound source. We introduce a Soundmap feature to capture spatial information across multiple frequency bands. We also use the Gammatone filter to generate acoustic features more suitable for outdoor environments. Furthermore, we integrate attention mechanisms to learn channel-wise relationships and temporal dependencies within the acoustic features. To evaluate our proposed method, we conduct experiments using simulated datasets with different levels of noise and size of monitoring areas, as well as different arrays and source positions. The experimental results demonstrate the superiority of our proposed method over state-of-the-art methods in both sound event classification and sound source localization tasks. And we provide further analysis to explain the reasons for the observed errors.
翻译:基于深度学习的声音事件定位与分类是无线声学传感器网络中的一个新兴研究领域。然而,当前的声音事件定位与分类方法通常依赖于单个麦克风阵列,这使得它们容易受到信号衰减和环境噪声的影响,从而限制了其监测范围。此外,使用多个麦克风阵列的方法往往仅关注声源定位,而忽略了声音事件分类的方面。本文提出了一种基于深度学习的方法,该方法利用多种特征和注意力机制来估计声源的位置和类别。我们引入了Soundmap特征来捕捉多个频带的空间信息。我们还使用Gammatone滤波器来生成更适合户外环境的声学特征。此外,我们集成了注意力机制来学习声学特征中的通道间关系和时间依赖性。为了评估我们提出的方法,我们使用具有不同噪声水平和监测区域大小的模拟数据集,以及不同的阵列和声源位置进行了实验。实验结果表明,在声音事件分类和声源定位任务中,我们提出的方法均优于现有最先进的方法。并且我们提供了进一步的分析来解释所观察到的误差原因。