Music source separation (MSS) aims to separate mixed music into its distinct tracks, such as vocals, bass, drums, and more. MSS is considered to be a challenging audio separation task due to the complexity of music signals. Although the RNN and Transformer architecture are not perfect, they are commonly used to model the music sequence for MSS. Recently, Mamba-2 has already demonstrated high efficiency in various sequential modeling tasks, but its superiority has not been investigated in MSS. This paper applies Mamba-2 with a two-stage strategy, which introduces residual mapping based on the mask method, effectively compensating for the details absent in the mask and further improving separation performance. Experiments confirm the superiority of bidirectional Mamba-2 and the effectiveness of the two-stage network in MSS. The source code is publicly accessible at https://github.com/baijinglin/TS-BSmamba2.
翻译:音乐源分离(MSS)旨在将混合音乐分离为独立的音轨,如人声、贝斯、鼓等。由于音乐信号的复杂性,MSS被视为一项具有挑战性的音频分离任务。尽管RNN和Transformer架构并非完美,但它们通常被用于为MSS建模音乐序列。最近,Mamba-2已在多种序列建模任务中展现出高效性,但其在MSS中的优势尚未得到探究。本文采用双阶段策略应用Mamba-2,该方法基于掩码方法引入残差映射,有效补偿了掩码中缺失的细节,并进一步提升了分离性能。实验证实了双向Mamba-2的优越性以及双阶段网络在MSS中的有效性。源代码公开于https://github.com/baijinglin/TS-BSmamba2。