We introduce GAP-URGENet, a generative-predictive fusion framework developed for Track 1 of the ICASSP 2026 URGENT Challenge. The system integrates a generative branch, which performs full-stack speech restoration in a self-supervised representation domain and reconstructs the waveform via a neural vocoder, along with a predictive branch that performs spectrogram-domain enhancement, providing complementary cues. Outputs from both branches are fused by a post-processing module, which also performs bandwidth extension to generate the enhanced waveform at 48 kHz, later downsampled to the original sampling rate. This generative-predictive fusion improves robustness and perceptual quality, achieving top performance in the blind-test phase and ranking 1st in the objective evaluation. Audio examples are available at https://xiaobin-rong.github.io/gap-urgenet_demo.
翻译:我们提出GAP-URGENet,一种为ICASSP 2026 URGENT挑战赛Track 1开发的生成-预测融合框架。该系统整合了生成分支与预测分支:生成分支在自监督表示域执行全栈语音修复,并通过神经声码器重构波形;预测分支则进行频谱域增强,提供互补线索。两个分支的输出由后处理模块融合,该模块同时执行带宽扩展以生成48 kHz的增强波形,随后降采样至原始采样率。这种生成-预测融合机制提升了鲁棒性和感知质量,在盲测阶段取得最优性能,并在客观评估中排名第一。音频示例请访问https://xiaobin-rong.github.io/gap-urgenet_demo。