Classical methods for acoustic scene mapping require the estimation of time difference of arrival (TDOA) between microphones. Unfortunately, TDOA estimation is very sensitive to reverberation and additive noise. We introduce an unsupervised data-driven approach that exploits the natural structure of the data. Our method builds upon local conformal autoencoders (LOCA) - an offline deep learning scheme for learning standardized data coordinates from measurements. Our experimental setup includes a microphone array that measures the transmitted sound source at multiple locations across the acoustic enclosure. We demonstrate that LOCA learns a representation that is isometric to the spatial locations of the microphones. The performance of our method is evaluated using a series of realistic simulations and compared with other dimensionality-reduction schemes. We further assess the influence of reverberation on the results of LOCA and show that it demonstrates considerable robustness.
翻译:经典的声学场景映射方法需要估计麦克风之间的到达时间差(TDOA)。然而,TDOA估计对混响和加性噪声非常敏感。我们提出了一种利用数据自然结构的无监督数据驱动方法。该方法基于局部共形自编码器(LOCA)——一种从测量数据中学习标准化数据坐标的离线深度学习方案。我们的实验设置包括一个麦克风阵列,该阵列在声学封闭空间内的多个位置测量发射声源。我们证明LOCA能够学习与麦克风空间位置等距的表示。通过一系列逼真仿真实验评估了该方法性能,并与其他降维方案进行了比较。我们进一步评估了混响对LOCA结果的影响,表明其具有显著的鲁棒性。