We propose Relativistic Adversarial Feedback (RAF), a novel training objective for GAN vocoders that improves in-domain fidelity and generalization to unseen scenarios. Although modern GAN vocoders employ advanced architectures, their training objectives often fail to promote generalizable representations. RAF addresses this problem by leveraging speech self-supervised learning models to assist discriminators in evaluating sample quality, encouraging the generator to learn richer representations. Furthermore, we utilize relativistic pairing for real and fake waveforms to improve the modeling of the training data distribution. Experiments across multiple datasets show consistent gains in both objective and subjective metrics on GAN-based vocoders. Importantly, the RAF-trained BigVGAN-base outperforms the LSGAN-trained BigVGAN in perceptual quality using only 12\% of the parameters. Comparative studies further confirm the effectiveness of RAF as a training framework for GAN vocoders.
翻译:我们提出了相对论对抗反馈(RAF),这是一种针对GAN声码器的新型训练目标,可提升域内保真度以及对未见场景的泛化能力。尽管现代GAN声码器采用了先进的架构,但其训练目标往往难以促进可泛化的表示。RAF通过利用语音自监督学习模型辅助判别器评估样本质量,从而激励生成器学习更丰富的表征,解决了这一问题。此外,我们利用真实波形与虚假波形的相对配对来改进对训练数据分布的建模。跨多个数据集的实验表明,基于GAN的声码器在客观与主观指标上均取得了一致提升。重要的是,经过RAF训练的BigVGAN-base仅使用12%的参数,就在感知质量上超越了经LSGAN训练的BigVGAN。对比研究进一步证实了RAF作为GAN声码器训练框架的有效性。