Audio Super-Resolution (SR) is an important topic in the field of audio processing. Many models are designed in time domain due to the advantage of waveform processing, such as being able to avoid the phase problem. However, in prior works it is shown that Time-Domain Convolutional Neural Network (TD-CNN) approaches tend to produce annoying artifacts in their output. In order to confirm the source of the artifact, we conduct an AB listening test and found phase to be the cause. We further propose Time-Domain Phase Repair (TD-PR) to improve TD-CNNs' performance by repairing the phase of the TD-CNNs' output. In this paper, we focus on the music SR task, which is challenging due to the wide frequency response and dynamic range of music. Our proposed method can handle various narrow-bandwidth from 2.5kHz to 4kHz with a target bandwidth of 8kHz. We conduct both objective and subjective evaluation to assess the proposed method. The objective evaluation result indicates the proposed method achieves the SR task effectively. Moreover, the proposed TD-PR obtains the much higher mean opinion scores than all TD-CNN baselines, which indicates that the proposed TD-PR significantly improves perceptual quality. Samples are available on the demo page.
翻译:音频超分辨率是音频处理领域的重要课题。由于波形处理在避免相位问题等方面的优势,许多模型采用时域设计。然而,先前研究表明,时域卷积神经网络方法易在输出中产生令人不悦的伪影。为确认伪影来源,我们进行了AB听音测试,发现相位是根本原因。我们进一步提出时域相位修复方法,通过修复TD-CNN输出相位来提升其性能。本文聚焦于音乐超分辨率任务,该任务因音乐宽频响应和大动态范围而极具挑战性。所提方法可处理2.5kHz至4kHz的多种窄带宽输入,目标带宽为8kHz。我们通过客观与主观评估验证方法有效性:客观评估表明该方法能有效完成超分辨率任务;此外,所提TD-PR在所有TD-CNN基线方法中获得了更高的平均意见分,证明其显著提升了感知质量。示例音频详见演示页面。