We address the problem of time-frequency audio inpainting, where the goal is to fill missing spectrogram portions with reliable information. Despite recent advances, existing approaches still face limitations in both reconstruction quality and computational efficiency. To bridge this gap, we propose a method that utilizes a phase-aware signal prior which exploits estimates of the instantaneous frequency. An optimization problem is formulated and solved using the generalized Chambolle-Pock algorithm. The proposed method is evaluated against other time-frequency inpainting methods, specifically a deep-prior audio inpainting neural network and the autoregression-based approach known as Janssen-TF. Our proposed approach surpassed these methods by a large margin in the objective evaluation as well as in the conducted subjective listening test, improving the state of the art. In addition, the reconstructions are obtained with a substantially reduced computational cost compared to alternative methods.
翻译:本文研究时频域音频修复问题,其目标在于利用可靠信息填补缺失的频谱图区域。尽管近期研究已取得进展,现有方法在重建质量与计算效率方面仍存在局限。为弥补这一不足,我们提出一种利用相位感知信号先验的方法,该先验通过瞬时频率估计实现信号建模。我们构建了相应的优化问题,并采用广义Chambolle-Pock算法进行求解。通过与现有时频修复方法(特别是深度先验音频修复神经网络及基于自回归的Janssen-TF方法)进行对比实验,本方法在客观评价指标与主观听觉测试中均显著优于现有技术,实现了性能突破。此外,相较于其他方法,本方法能以显著降低的计算成本获得重建结果。