Generative adversarial networks have recently demonstrated outstanding performance in neural vocoding outperforming best autoregressive and flow-based models. In this paper, we show that this success can be extended to other tasks of conditional audio generation. In particular, building upon HiFi vocoders, we propose a novel HiFi++ general framework for bandwidth extension and speech enhancement. We show that with the improved generator architecture, HiFi++ performs better or comparably with the state-of-the-art in these tasks while spending significantly less computational resources. The effectiveness of our approach is validated through a series of extensive experiments.
翻译:生成对抗网络近期在神经声码器中展现出卓越性能,超越了最优的自回归和基于流的模型。本文证明这一成功可扩展至其他条件音频生成任务。具体而言,基于HiFi声码器体系,我们提出了用于带宽扩展和语音增强的新型通用框架HiFi++。研究表明,通过改进的生成器架构,HiFi++在相关任务中的表现优于或媲美现有最优技术,同时显著降低计算资源消耗。通过一系列广泛实验验证了该方法的有效性。