While deep-learning-based speaker localization has shown advantages in challenging acoustic environments, it often yields only direction-of-arrival (DOA) cues rather than precise two-dimensional (2D) coordinates. To address this, we propose a novel deep-learning-based 2D speaker localization method leveraging ad-hoc microphone arrays, where an ad-hoc microphone array is composed of randomly distributed microphone nodes, each of which is equipped with a traditional array. Specifically, we first employ convolutional neural networks at each node to estimate speaker directions. Then, we integrate these DOA estimates using triangulation and clustering techniques to get 2D speaker locations. To further boost the estimation accuracy, we introduce a node selection algorithm that strategically filters the most reliable nodes. Extensive experiments on both simulated and real-world data demonstrate that our approach significantly outperforms conventional methods. The proposed node selection further refines performance. The real-world dataset in the experiment, named Libri-adhoc-node10 which is a newly recorded data described for the first time in this paper, is online available at https://github.com/Liu-sp/Libri-adhoc-nodes10.
翻译:虽然基于深度学习的声源定位在复杂声学环境中展现出优势,但其通常仅能提供到达方向(DOA)线索,而非精确的二维坐标。为解决这一问题,我们提出了一种新颖的基于深度学习的二维声源定位方法,该方法利用自组麦克风阵列——由随机分布的麦克风节点构成,每个节点均配备传统阵列。具体而言,我们首先在每个节点处采用卷积神经网络估计声源方向,随后通过三角测量与聚类技术整合这些DOA估计结果,获取二维声源位置。为进一步提升估计精度,我们引入节点选择算法,策略性地筛选最可靠的节点。在仿真数据与实际数据上进行的大量实验表明,我们的方法显著优于传统方法。所提出的节点选择机制进一步优化了性能。实验中使用的新记录数据集Libri-adhoc-node10(本文首次描述)可通过https://github.com/Liu-sp/Libri-adhoc-nodes10在线获取。