Generative models have shown impressive results in speech enhancement but often suffer from multi-step inference. We propose SB-RF, a one-step generative framework integrating Rectified Flow (RF) with Schrödinger Bridge (SB) theory. SB-RF constructs a conditional bridge between clean and noisy speech distributions via entropy-regularized optimal transport. By aligning SB trajectories with the optimal transport geodesic through the velocity-matching objective of RF, SB-RF enables high-quality enhancement with one-step generation. Experiments demonstrate that SB-RF achieves leading performance among generative methods on the VoiceBank-DEMAND benchmark. Furthermore, to fully assess performance in challenging real-world scenarios, we evaluate SB-RF on a simulated low signal-to-noise ratio test set using an expanded training dataset. Under these conditions, SB-RF exhibits strong and competitive robustness with high efficiency, validating its potential for real-world applications.
翻译:生成模型在语音增强中已展现出令人瞩目的结果,但常受限于多步推理。我们提出SB-RF,一种集成了整流流与薛定谔桥理论的一步生成框架。SB-RF通过熵正则化最优输运,构建了干净语音与含噪语音分布之间的条件桥。通过将SB轨迹与RF速度匹配目标下的最优输运测地线对齐,SB-RF实现了高质量的一步生成增强。实验表明,在VoiceBank-DEMAND基准上,SB-RF在生成式方法中达到了领先性能。此外,为全面评估其在挑战性真实场景中的表现,我们利用扩充训练数据集,在一个模拟低信噪比测试集上评估了SB-RF。在这些条件下,SB-RF展现出强健且具竞争力的鲁棒性及高效率,验证了其在真实世界应用中的潜力。