The task of bandwidth extension addresses the generation of missing high frequencies of audio signals based on knowledge of the low-frequency part of the sound. This task applies to various problems, such as audio coding or audio restoration. In this article, we focus on efficient bandwidth extension of monophonic and polyphonic musical signals using a differentiable digital signal processing (DDSP) model. Such a model is composed of a neural network part with relatively few parameters trained to infer the parameters of a differentiable digital signal processing model, which efficiently generates the output full-band audio signal. We first address bandwidth extension of monophonic signals, and then propose two methods to explicitely handle polyphonic signals. The benefits of the proposed models are first demonstrated on monophonic and polyphonic synthetic data against a baseline and a deep-learning-based resnet model. The models are next evaluated on recorded monophonic and polyphonic data, for a wide variety of instruments and musical genres. We show that all proposed models surpass a higher complexity deep learning model for an objective metric computed in the frequency domain. A MUSHRA listening test confirms the superiority of the proposed approach in terms of perceptual quality.
翻译:带宽扩展任务旨在根据音频信号低频部分的知识生成缺失的高频成分。该任务适用于音频编码或音频修复等多种问题。本文聚焦于使用可微分数字信号处理(DDSP)模型对单声道与多声道音乐信号进行高效带宽扩展。此类模型由参数相对较少的神经网络部分组成,该网络经训练后可推断可微分数字信号处理模型的参数,从而高效生成全频带输出音频信号。我们首先解决单声道信号的带宽扩展问题,继而提出两种显式处理多声道信号的方法。在合成数据的单声道与多声道实验中,将所提模型与基线方法及基于深度学习的残差网络模型进行对比,验证其优势。随后在涵盖多种乐器与音乐风格的实测单声道与多声道数据上评估模型性能。结果表明,所有提出模型在频域客观指标上均超越更高复杂度的深度学习模型。MUSHRA听力测试进一步证实了该方法在感知质量上的优越性。