Dual-path is a popular architecture for speech separation models (e.g. Sepformer) which splits long sequences into overlapping chunks for its intra- and inter-blocks that separately model intra-chunk local features and inter-chunk global relationships. However, it has been found that inter-blocks, which comprise half a dual-path model's parameters, contribute minimally to performance. Thus, we propose the Single-Path Global Modulation (SPGM) block to replace inter-blocks. SPGM is named after its structure consisting of a parameter-free global pooling module followed by a modulation module comprising only 2% of the model's total parameters. The SPGM block allows all transformer layers in the model to be dedicated to local feature modelling, making the overall model single-path. SPGM achieves 22.1 dB SI-SDRi on WSJ0-2Mix and 20.4 dB SI-SDRi on Libri2Mix, exceeding the performance of Sepformer by 0.5 dB and 0.3 dB respectively and matches the performance of recent SOTA models with up to 8 times fewer parameters. Model and weights are available at huggingface.co/yipjiaqi/spgm
翻译:双路径是语音分离模型(如Sepformer)的流行架构,该架构将长序列分割为重叠的块,并通过其内块和外块分别建模块内局部特征与块间全局关系。然而,研究发现,占双路径模型一半参数量的外块对性能贡献极小。为此,我们提出单路径全局调制(SPGM)块来替代外块。SPGM因其结构得名:由一个无参数的全局池化模块与一个仅占模型总参数量2%的调制模块组成。该模块允许模型中所有Transformer层专注于局部特征建模,使整体模型成为单路径结构。SPGM在WSJ0-2Mix上达到22.1 dB的SI-SDRi,在Libri2Mix上达到20.4 dB的SI-SDRi,分别超越Sepformer 0.5 dB和0.3 dB,并以最多8倍的参数量减少匹配近期SOTA模型的性能。模型及权重已开源至huggingface.co/yipjiaqi/spgm。