Dual-path is a popular architecture for speech separation models (e.g. Sepformer) which splits long sequences into overlapping chunks for its intra- and inter-blocks that separately model intra-chunk local features and inter-chunk global relationships. However, it has been found that inter-blocks, which comprise half a dual-path model's parameters, contribute minimally to performance. Thus, we propose the Single-Path Global Modulation (SPGM) block to replace inter-blocks. SPGM is named after its structure consisting of a parameter-free global pooling module followed by a modulation module comprising only 2% of the model's total parameters. The SPGM block allows all transformer layers in the model to be dedicated to local feature modelling, making the overall model single-path. SPGM achieves 22.1 dB SI-SDRi on WSJ0-2Mix and 20.4 dB SI-SDRi on Libri2Mix, exceeding the performance of Sepformer by 0.5 dB and 0.3 dB respectively and matches the performance of recent SOTA models with up to 8 times fewer parameters.
翻译:摘要:双路径是语音分离模型(如Sepformer)的流行架构,它将长序列分割为重叠的块,通过块内和块间模块分别建模块内局部特征与块间全局关系。然而研究发现,占据双路径模型一半参数的块间模块对性能贡献甚微。为此,我们提出单路径全局调制(SPGM)模块以替代块间模块。SPGM以其结构命名:由一个无参数全局池化模块和一个仅占模型总参数2%的调制模块组成。该模块使模型中的所有Transformer层专注于局部特征建模,从而实现整体单路径架构。SPGM在WSJ0-2Mix上达到22.1 dB SI-SDRi,在Libri2Mix上达到20.4 dB SI-SDRi,分别比Sepformer高0.5 dB和0.3 dB,并以最多减少8倍参数量的性能匹配近期最先进模型。