Diffusion Models for Audio Restoration

With the development of audio playback devices and fast data transmission, the demand for high sound quality is rising for both entertainment and communications. In this quest for better sound quality, challenges emerge from distortions and interferences originating at the recording side or caused by an imperfect transmission pipeline. To address this problem, audio restoration methods aim to recover clean sound signals from the corrupted input data. We present here audio restoration algorithms based on diffusion models, with a focus on speech enhancement and music restoration tasks. Traditional approaches, often grounded in handcrafted rules and statistical heuristics, have shaped our understanding of audio signals. In the past decades, there has been a notable shift towards data-driven methods that exploit the modeling capabilities of DNNs. Deep generative models, and among them diffusion models, have emerged as powerful techniques for learning complex data distributions. However, relying solely on DNN-based learning approaches carries the risk of reducing interpretability, particularly when employing end-to-end models. Nonetheless, data-driven approaches allow more flexibility in comparison to statistical model-based frameworks, whose performance depends on distributional and statistical assumptions that can be difficult to guarantee. Here, we aim to show that diffusion models can combine the best of both worlds and offer the opportunity to design audio restoration algorithms with a good degree of interpretability and a remarkable performance in terms of sound quality. We explain the diffusion formalism and its application to the conditional generation of clean audio signals. We believe that diffusion models open an exciting field of research with the potential to spawn new audio restoration algorithms that are natural-sounding and remain robust in difficult acoustic situations.

翻译：随着音频播放设备和快速数据传输技术的发展，娱乐和通信领域对高音质的需求日益增长。在追求更高音质的进程中，来自录制端或非理想传输链路引起的失真与干扰带来了诸多挑战。为解决这一问题，音频修复方法致力于从受损的输入数据中恢复纯净的音频信号。本文提出基于扩散模型的音频修复算法，重点关注语音增强与音乐修复任务。传统方法通常基于人工设计的规则和统计启发式策略，这些方法塑造了我们对音频信号的理解。过去数十年来，研究趋势显著转向利用深度神经网络建模能力的数据驱动方法。深度生成模型（特别是扩散模型）已成为学习复杂数据分布的有效技术。然而，完全依赖基于深度神经网络的学习方法可能会降低可解释性，尤其是在采用端到端模型时。尽管如此，与基于统计模型的框架相比，数据驱动方法具有更高的灵活性——后者的性能依赖于难以保证的分布与统计假设。本文旨在证明扩散模型能够融合两类方法的优势，为设计兼具良好可解释性和卓越音质表现的音频修复算法提供可能。我们阐释了扩散模型的形式化框架及其在纯净音频信号条件生成中的应用。我们相信，扩散模型开辟了一个令人振奋的研究领域，有望催生音质自然且在复杂声学场景中保持鲁棒性的新型音频修复算法。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日