Multichannel convolutive blind speech source separation refers to the problem of separating different speech sources from the observed multichannel mixtures without much a priori information about the mixing system. Multichannel nonnegative matrix factorization (MNMF) has been proven to be one of the most powerful separation frameworks and the representative algorithms such as MNMF and the independent low-rank matrix analysis (ILRMA) have demonstrated great performance. However, the sparseness properties of speech source signals are not fully taken into account in such a framework. It is well known that speech signals are sparse in nature, which is considered in this work to improve the separation performance. Specifically, we utilize the Bingham and Laplace distributions to formulate a disjoint constraint regularizer, which is subsequently incorporated into both MNMF and ILRMA. We then derive majorization-minimization rules for updating parameters related to the source model, resulting in the development of two enhanced algorithms: s-MNMF and s-ILRMA. Comprehensive simulations are conducted, and the results unequivocally demonstrate the efficacy of our proposed methodologies.
翻译:多通道卷积盲语音分离是指在无太多混响系统先验信息的情况下,从观测到的多通道混合信号中分离不同语音源的问题。多通道非负矩阵分解(MNMF)已被证明是最强大的分离框架之一,其中代表性算法如MNMF和独立低秩矩阵分析(ILRMA)展现了卓越性能。然而,此类框架未充分考虑语音源信号的稀疏性特征。众所周知,语音信号本质上具有稀疏性,本研究利用这一特性来提升分离性能。具体而言,我们采用Bingham分布和Laplace分布构建不重叠约束正则化项,并将其分别引入MNMF和ILRMA中。随后,我们推导出用于更新源模型相关参数的majorization-minimization规则,进而开发了两种增强算法:s-MNMF和s-ILRMA。通过全面仿真实验,结果明确验证了所提方法的有效性。