Neural audio super-resolution models are typically trained on low- and high-resolution audio signal pairs. Although these methods achieve highly accurate super-resolution if the acoustic characteristics of the input data are similar to those of the training data, challenges remain: the models suffer from quality degradation for out-of-domain data, and paired data are required for training. To address these problems, we propose Dual-CycleGAN, a high-quality audio super-resolution method that can utilize unpaired data based on two connected cycle consistent generative adversarial networks (CycleGAN). Our method decomposes the super-resolution method into domain adaptation and resampling processes to handle acoustic mismatch in the unpaired low- and high-resolution signals. The two processes are then jointly optimized within the CycleGAN framework. Experimental results verify that the proposed method significantly outperforms conventional methods when paired data are not available. Code and audio samples are available from https://chomeyama.github.io/DualCycleGAN-Demo/.
翻译:神经音频超分辨率模型通常需在低分辨率与高分辨率音频信号对上进行训练。尽管当输入数据的声学特性与训练数据相似时,此类方法能实现高度精确的超分辨率,但仍面临挑战:模型在域外数据上存在质量退化问题,且训练需要配对数据。为解决这些问题,我们提出Dual-CycleGAN——一种基于两个循环一致生成对抗网络(CycleGAN)连接、可利用非配对数据的高质量音频超分辨率方法。该方法将超分辨率过程分解为域自适应与重采样两个阶段,以处理非配对低分辨率与高分辨率信号间的声学不匹配问题,随后在CycleGAN框架内对这两个过程进行联合优化。实验结果表明,在无配对数据的情况下,所提方法显著优于传统方法。代码及音频样本见https://chomeyama.github.io/DualCycleGAN-Demo/。