Generally, Deep Neural Networks (DNNs) are expected to have high performance when their model size is large. However, large models failed to produce high-quality results commensurate with their scale in music Super-Resolution (SR). We attribute this to that DNNs cannot learn information commensurate with their size from standard mean square error losses. To unleash the potential of large DNN models in music SR, we propose BigWavGAN, which incorporates Demucs, a large-scale wave-to-wave model, with State-Of-The-Art (SOTA) discriminators and adversarial training strategies. Our discriminator consists of Multi-Scale Discriminator (MSD) and Multi-Resolution Discriminator (MRD). During inference, since only the generator is utilized, there are no additional parameters or computational resources required compared to the baseline model Demucs. Objective evaluation affirms the effectiveness of BigWavGAN in music SR. Subjective evaluations indicate that BigWavGAN can generate music with significantly high perceptual quality over the baseline model. Notably, BigWavGAN surpasses the SOTA music SR model in both simulated and real-world scenarios. Moreover, BigWavGAN represents its superior generalization ability to address out-of-distribution data. The conducted ablation study reveals the importance of our discriminators and training strategies. Samples are available on the demo page.
翻译:通常,深度神经网络(DNNs)在模型规模较大时有望取得高性能。然而,在音乐超分辨率(SR)任务中,大模型未能产生与其规模相称的高质量结果。我们将其归因于DNNs无法从标准均方误差损失中学习到与其规模相匹配的信息。为了释放大规模DNN模型在音乐SR中的潜力,我们提出了BigWavGAN,它将大规模波到波模型Demucs与最先进的判别器及对抗训练策略相结合。我们的判别器由多尺度判别器(MSD)和多分辨率判别器(MRD)组成。在推理阶段,由于仅使用生成器,因此相比基线模型Demucs无需额外参数或计算资源。客观评估证实了BigWavGAN在音乐SR中的有效性。主观评估表明,BigWavGAN能够生成感知质量显著高于基线模型的音乐。值得注意的是,BigWavGAN在模拟和真实场景中均超越了最先进的音乐SR模型。此外,BigWavGAN展现出优越的泛化能力,能够处理分布外数据。消融实验揭示了判别器及训练策略的重要性。样本可在演示页面获取。