The Interspeech 2025 URGENT Challenge aimed to advance universal, robust, and generalizable speech enhancement by unifying speech enhancement tasks across a wide variety of conditions, including seven different distortion types and five languages. We present Universal Speech Enhancement Mamba (USEMamba), a state-space speech enhancement model designed to handle long-range sequence modeling, time-frequency structured processing, and sampling frequency-independent feature extraction. Our approach primarily relies on regression-based modeling, which performs well across most distortions. However, for packet loss and bandwidth extension, where missing content must be inferred, a generative variant of the proposed USEMamba proves more effective. Despite being trained on only a subset of the full training data, USEMamba achieved 2nd place in Track 1 during the blind test phase, demonstrating strong generalization across diverse conditions.
翻译:Interspeech 2025 URGENT挑战赛旨在通过整合涵盖七种失真类型和五种语言的多样化场景下的语音增强任务,推动通用、鲁棒且可泛化的语音增强技术发展。本文提出通用语音增强Mamba(USEMamba),这是一种状态空间语音增强模型,专为处理长程序列建模、时频结构化处理以及采样频率无关的特征提取而设计。我们的方法主要依赖基于回归的建模,其在多数失真类型上表现良好。然而,对于需推断缺失内容的丢包和带宽扩展任务,所提出的USEMamba的生成式变体更为有效。尽管仅使用完整训练数据的子集进行训练,USEMamba在盲测阶段的Track 1中仍获得第二名,展现了其在多样化条件下的强大泛化能力。