SLAM-based Joint Calibration of Multiple Asynchronous Microphone Arrays and Sound Source Localization

Robot audition systems with multiple microphone arrays have many applications in practice. However, accurate calibration of multiple microphone arrays remains challenging because there are many unknown parameters to be identified, including the relative transforms (i.e., orientation, translation) and asynchronous factors (i.e., initial time offset and sampling clock difference) between microphone arrays. To tackle these challenges, in this paper, we adopt batch simultaneous localization and mapping (SLAM) for joint calibration of multiple asynchronous microphone arrays and sound source localization. Using the Fisher information matrix (FIM) approach, we first conduct the observability analysis (i.e., parameter identifiability) of the above-mentioned calibration problem and establish necessary/sufficient conditions under which the FIM and the Jacobian matrix have full column rank, which implies the identifiability of the unknown parameters. We also discover several scenarios where the unknown parameters are not uniquely identifiable. Subsequently, we propose an effective framework to initialize the unknown parameters, which is used as the initial guess in batch SLAM for multiple microphone arrays calibration, aiming to further enhance optimization accuracy and convergence. Extensive numerical simulations and real experiments have been conducted to verify the performance of the proposed method. The experiment results show that the proposed pipeline achieves higher accuracy with fast convergence in comparison to methods that use the noise-corrupted ground truth of the unknown parameters as the initial guess in the optimization and other existing frameworks.

翻译：多麦克风阵列机器人听觉系统在实际应用中具有广泛价值。然而，多麦克风阵列的精确标定仍面临挑战，因为需要辨识众多未知参数，包括麦克风阵列间的相对变换（即朝向、平移）和异步因素（即初始时间偏移与采样时钟差异）。为应对这些挑战，本文采用批量同步定位与建图（SLAM）技术，实现多异步麦克风阵列的联合标定与声源定位。基于费舍尔信息矩阵（FIM）方法，我们首先对上述标定问题进行了可观测性分析（即参数可辨识性），并建立了FIM与雅可比矩阵列满秩的充分必要条件，该条件意味着未知参数的可辨识性。同时，我们还发现了若干导致未知参数无法唯一辨识的场景。随后，我们提出了一种有效的未知参数初始化框架，将其作为批量SLAM中多麦克风阵列标定的初始估计值，旨在进一步提升优化精度与收敛性。通过大量数值仿真与真实实验验证了所提方法的性能。实验结果表明：相较于在优化中使用含噪声的未知参数真值作为初始估计的方法及其他现有框架，本文提出的流程能以更快的收敛速度获得更高的标定精度。