This paper describes an efficient unsupervised learning method for a neural source separation model that utilizes a probabilistic generative model of observed multichannel mixtures proposed for blind source separation (BSS). For this purpose, amortized variational inference (AVI) has been used for directly solving the inverse problem of BSS with full-rank spatial covariance analysis (FCA). Although this unsupervised technique called neural FCA is in principle free from the domain mismatch problem, it is computationally demanding due to the full rankness of the spatial model in exchange for robustness against relatively short reverberations. To reduce the model complexity without sacrificing performance, we propose neural FastFCA based on the jointly-diagonalizable yet full-rank spatial model. Our neural separation model introduced for AVI alternately performs neural network blocks and single steps of an efficient iterative algorithm called iterative source steering. This alternating architecture enables the separation model to quickly separate the mixture spectrogram by leveraging both the deep neural network and the multichannel optimization algorithm. The training objective with AVI is derived to maximize the marginalized likelihood of the observed mixtures. The experiment using mixture signals of two to four sound sources shows that neural FastFCA outperforms conventional BSS methods and reduces the computational time to about 2% of that for the neural FCA.
翻译:本文提出一种高效的神经源分离模型无监督学习方法,该方法利用为盲源分离(BSS)设计的观测多通道混合信号概率生成模型。为此,采用摊销变分推理(AVI)直接求解基于满秩空间协方差分析(FCA)的BSS逆问题。尽管这种称为神经FCA的无监督技术原则上不存在领域失配问题,但由于空间模型的满秩特性(以牺牲对较短混响的鲁棒性为代价),其计算复杂度较高。为在不损失性能的前提下降低模型复杂度,我们提出基于联合可对角化且保持满秩空间模型的神经FastFCA方法。该用于AVI的神经分离模型交替执行神经网络模块与名为迭代声源导向的高效迭代算法的单步操作。这种交替架构通过融合深度神经网络与多通道优化算法,使分离模型能够快速分离混合语谱图。基于AVI的训练目标被推导为最大化观测混合信号的边缘似然函数。使用两至四个声源混合信号的实验表明,神经FastFCA方法优于传统BSS方法,且计算时间降至神经FCA方法的约2%。