Audio restoration consists in inverting degradations of a digital audio signal to recover what would have been the pristine quality signal before the degradation occurred. This is valuable in contexts such as archives of music recordings, particularly those of precious historical value, for which a clean version may have been lost or simply does not exist. Recent work applied generative models to audio restoration, showing promising improvement over previous methods, and opening the door to the ability to perform restoration operations that were not possible before. However, making these models finely controllable remains a challenge. In this paper, we propose an extension of FLowHigh and introduce the Dynamic Spectral Contour (DSC) as a control signal for bandwidth extension via classifier-free guidance. Our experiments show competitive model performance, and indicate that DSC is a promising feature to support fine-grained conditioning.
翻译:音频修复旨在通过逆转数字音频信号的退化过程,以恢复其在退化发生前应有的原始质量信号。这在诸如音乐录音档案(尤其是具有珍贵历史价值的档案)等场景中具有重要意义,因为这些档案的洁净版本可能已经丢失或根本不存在。近期研究将生成模型应用于音频修复领域,显示出相较于传统方法的显著提升,并为实现以往无法完成的修复操作开辟了可能。然而,如何使这些模型实现精细可控仍是一个挑战。本文提出对FLowHigh模型的扩展,并引入动态频谱轮廓作为通过无分类器引导实现带宽扩展的控制信号。实验结果表明,该模型具有竞争力的性能,且动态频谱轮廓是支持细粒度条件控制的潜在有效特征。