Ambisonics, a popular format of spatial audio, is the spherical harmonic (SH) representation of the plane wave density function of a sound field. Many algorithms operate in the SH domain and utilize the Ambisonics as their input signal. The process of encoding Ambisonics from a spherical microphone array involves dividing by the radial functions, which may amplify noise at low frequencies. This can be overcome by regularization, with the downside of introducing errors to the Ambisonics encoding. This paper aims to investigate the impact of different ways of regularization on Deep Neural Network (DNN) training and performance. Ideally, these networks should be robust to the way of regularization. Simulated data of a single speaker in a room and experimental data from the LOCATA challenge were used to evaluate this robustness on an example algorithm of speaker localization based on the direct-path dominance (DPD) test. Results show that performance may be sensitive to the way of regularization, and an informed approach is proposed and investigated, highlighting the importance of regularization information.
翻译:Ambisonics作为一种流行的空间音频格式,是声场平面波密度函数的球谐函数表示。许多算法在球谐域中运行,并将Ambisonics用作其输入信号。从球形麦克风阵列编码Ambisonics的过程涉及除以径向函数,这会放大低频噪声。这一问题可通过正则化解决,但其代价是给Ambisonics编码引入误差。本文旨在探究不同正则化方式对深度神经网络(DNN)训练和性能的影响。理想情况下,这些网络应对正则化方式具有鲁棒性。本研究利用房间内单个扬声器的仿真数据以及LOCATA挑战的实验数据,基于直射路径优势(DPD)测试的扬声器定位示例算法评估了这种鲁棒性。结果表明,性能可能对正则化方式敏感,本文提出并研究了一种基于信息的方法,突显了正则化信息的重要性。