The task of bandwidth extension addresses the generation of missing high frequencies of audio signals based on knowledge of the low-frequency part of the sound. This task applies to various problems, such as audio coding or audio restoration. In this article, we focus on efficient bandwidth extension of monophonic and polyphonic musical signals using a differentiable digital signal processing (DDSP) model. Such a model is composed of a neural network part with relatively few parameters trained to infer the parameters of a differentiable digital signal processing model, which efficiently generates the output full-band audio signal. We first address bandwidth extension of monophonic signals, and then propose two methods to explicitely handle polyphonic signals. The benefits of the proposed models are first demonstrated on monophonic and polyphonic synthetic data against a baseline and a deep-learning-based resnet model. The models are next evaluated on recorded monophonic and polyphonic data, for a wide variety of instruments and musical genres. We show that all proposed models surpass a higher complexity deep learning model for an objective metric computed in the frequency domain. A MUSHRA listening test confirms the superiority of the proposed approach in terms of perceptual quality.
翻译:带宽扩展任务旨在根据声音低频部分的知识生成缺失的音频高频成分,该任务适用于音频编码、音频修复等多种问题。本文聚焦于使用可微数字信号处理(DDSP)模型对单音与复音音乐信号进行高效带宽扩展。该模型由参数量较少的神经网络部分组成,训练用于推断可微数字信号处理模型的参数,从而高效生成全频带输出音频信号。我们首先解决单音信号的带宽扩展问题,随后提出两种显式处理复音信号的方法。在单音与复音合成数据上,所提模型相对基线模型和基于深度学习的ResNet模型展现了优越性能。进一步在涵盖多种乐器与音乐风格的录制单音及复音数据上评估模型,结果表明所有提出模型在频域客观指标上均超越了复杂度更高的深度学习模型。MUSHRA听力测试证实了本方法在感知质量上的优越性。