Generative adversarial networks have recently demonstrated outstanding performance in neural vocoding outperforming best autoregressive and flow-based models. In this paper, we show that this success can be extended to other tasks of conditional audio generation. In particular, building upon HiFi vocoders, we propose a novel HiFi++ general framework for bandwidth extension and speech enhancement. We show that with the improved generator architecture, HiFi++ performs better or comparably with the state-of-the-art in these tasks while spending significantly less computational resources. The effectiveness of our approach is validated through a series of extensive experiments.
翻译:生成对抗网络最近在神经声码器中展现出卓越性能,超越了最优的自回归和基于流的模型。本文表明,这一成功可扩展到其他条件音频生成任务。具体而言,基于HiFi声码器,我们提出了一种用于带宽扩展和语音增强的新型HiFi++通用框架。研究表明,通过改进生成器架构,HiFi++在这些任务中的性能达到或超越当前最优水平,同时显著降低计算资源消耗。一系列广泛实验验证了该方法的有效性。