Audio restoration has become increasingly significant in modern society, not only due to the demand for high-quality auditory experiences enabled by advanced playback devices, but also because the growing capabilities of generative audio models necessitate high-fidelity audio. Typically, audio restoration is defined as a task of predicting undistorted audio from damaged input, often trained using a GAN framework to balance perception and distortion. Since audio degradation is primarily concentrated in mid- and high-frequency ranges, especially due to codecs, a key challenge lies in designing a generator capable of preserving low-frequency information while accurately reconstructing high-quality mid- and high-frequency content. Inspired by recent advancements in high-sample-rate music separation, speech enhancement, and audio codec models, we propose Apollo, a generative model designed for high-sample-rate audio restoration. Apollo employs an explicit frequency band split module to model the relationships between different frequency bands, allowing for more coherent and higher-quality restored audio. Evaluated on the MUSDB18-HQ and MoisesDB datasets, Apollo consistently outperforms existing SR-GAN models across various bit rates and music genres, particularly excelling in complex scenarios involving mixtures of multiple instruments and vocals. Apollo significantly improves music restoration quality while maintaining computational efficiency. The source code for Apollo is publicly available at https://github.com/JusperLee/Apollo.
翻译:音频修复在现代社会中日益重要,这不仅源于先进播放设备对高质量听觉体验的需求,也由于生成式音频模型能力的不断提升需要高保真音频作为支撑。通常,音频修复被定义为从受损输入预测无损音频的任务,常采用GAN框架进行训练以平衡感知质量与失真度。由于音频退化主要集中于中高频段(尤其是编解码器导致的失真),核心挑战在于设计一种能保留低频信息、同时精确重建高质量中高频内容的生成器。受近期高采样率音乐分离、语音增强与音频编解码模型进展的启发,我们提出阿波罗(Apollo)——一种专为高采样率音频修复设计的生成模型。Apollo采用显式频带分割模块来建模不同频带间的关联,从而生成更连贯且更高质量的修复音频。在MUSDB18-HQ和MoisesDB数据集上的评估表明,Apollo在不同比特率与音乐流派中均持续优于现有SR-GAN模型,尤其在涉及多乐器与人声混合的复杂场景中表现突出。Apollo在保持计算效率的同时显著提升了音乐修复质量。项目源代码已公开于https://github.com/JusperLee/Apollo。