Ambisonics encoding of microphone array signals can enable various spatial audio applications, such as virtual reality or telepresence, but it is typically designed for uniformly-spaced spherical microphone arrays. This paper proposes a method for Ambisonics encoding that uses a deep neural network (DNN) to estimate a signal transform from microphone inputs to Ambisonics signals. The approach uses a DNN consisting of a U-Net structure with a learnable preprocessing as well as a loss function consisting of mean average error, spatial correlation, and energy preservation components. The method is validated on two microphone arrays with regular and irregular shapes having four microphones, on simulated reverberant scenes with multiple sources. The results of the validation show that the proposed method can meet or exceed the performance of a conventional signal-independent Ambisonics encoder on a number of error metrics.
翻译:阿姆比索尼克编码(Ambisonics encoding)通过麦克风阵列信号可实现多种空间音频应用,如虚拟现实或远程呈现,但传统方法通常针对均匀分布的球形麦克风阵列设计。本文提出一种基于深度神经网络(DNN)的阿姆比索尼克编码方法,通过DNN估计从麦克风输入到阿姆比索尼克信号的信号变换。该网络采用具有可学习预处理的U-Net结构,并构建包含平均绝对误差、空间相关性和能量保持分量的损失函数。方法在包含四枚麦克风的规则与不规则形状双阵列上,针对多声源模拟混响场景进行验证。验证结果表明,所提方法在多项误差指标上能够达到或超越传统信号无关型阿姆比索尼克编码器的性能。