We propose Schr\"odinger Bridge Mamba (SBM), a new concept of training-inference framework motivated by the inherent compatibility between Schr\"odinger Bridge (SB) training paradigm and selective state-space model Mamba. We exemplify the concept of SBM with an implementation for generative speech enhancement. Experiments on a joint denoising and dereverberation task using four benchmark datasets demonstrate that SBM, with only 1-step inference, outperforms strong baselines with 1-step or iterative inference and achieves the best real-time factor (RTF). Beyond speech enhancement, we discuss the integration of SB paradigm and selective state-space model architecture based on their underlying alignment, which indicates a promising direction for exploring new deep generative models potentially applicable to a broad range of generative tasks. Demo page: https://sbmse.github.io
翻译:我们提出薛定谔桥曼巴(SBM),这是一种受薛定谔桥(SB)训练范式与选择性状态空间模型Mamba之间内在兼容性启发的新型训练-推理框架概念。我们通过一个生成式语音增强的实现来具体阐释SBM的概念。在联合去噪与去混响任务上,使用四个基准数据集的实验表明,SBM仅需一步推理,其性能即优于采用一步或迭代推理的强基线模型,并实现了最佳实时因子(RTF)。除语音增强外,我们基于二者内在的契合性,探讨了SB范式与选择性状态空间模型架构的融合,这为探索可能广泛适用于多种生成任务的新型深度生成模型指出了一个有前景的方向。演示页面:https://sbmse.github.io