The Interspeech 2025 URGENT Challenge aimed to advance universal, robust, and generalizable speech enhancement by unifying speech enhancement tasks across a wide variety of conditions, including seven different distortion types and five languages. We present Universal Speech Enhancement Mamba (USEMamba), a state-space speech enhancement model designed to handle long-range sequence modeling, time-frequency structured processing, and sampling frequency-independent feature extraction. Our approach primarily relies on regression-based modeling, which performs well across most distortions. However, for packet loss and bandwidth extension, where missing content must be inferred, a generative variant of the proposed USEMamba proves more effective. Despite being trained on only a subset of the full training data, USEMamba achieved 2nd place in Track 1 during the blind test phase, demonstrating strong generalization across diverse conditions.
翻译:Interspeech 2025 URGENT挑战赛旨在通过统一涵盖七种失真类型和五种语言的广泛场景下的语音增强任务,推动通用、鲁棒且可泛化的语音增强技术发展。本文提出通用语音增强Mamba(USEMamba),这是一种状态空间语音增强模型,专为处理长序列建模、时频结构化处理以及与采样频率无关的特征提取而设计。我们的方法主要基于回归建模,在多数失真类型上表现良好。然而,针对需要推断缺失内容的丢包和带宽扩展场景,所提出的生成式USEMamba变体展现出更优性能。尽管仅使用完整训练数据的子集进行训练,USEMamba在盲测阶段的Track 1中取得了第二名,展现了其在多样化条件下的强大泛化能力。