Digital audio signal reconstruction of a lost or corrupt segment using deep learning algorithms has been explored intensively in recent years. Nevertheless, prior traditional methods with linear interpolation, phase coding and tone insertion techniques are still in vogue. However, we found no research work on reconstructing audio signals with the fusion of dithering, steganography, and machine learning regressors. Therefore, this paper proposes the combination of steganography, halftoning (dithering), and state-of-the-art shallow and deep learning methods. The results (including comparing the SPAIN, Autoregressive, deep learning-based, graph-based, and other methods) are evaluated with three different metrics. The observations from the results show that the proposed solution is effective and can enhance the reconstruction of audio signals performed by the side information (e.g., Latent representation) steganography provides. Moreover, this paper proposes a novel framework for reconstruction from heavily compressed embedded audio data using halftoning (i.e., dithering) and machine learning, which we termed the HCR (halftone-based compression and reconstruction). This work may trigger interest in optimising this approach and/or transferring it to different domains (i.e., image reconstruction). Compared to existing methods, we show improvement in the inpainting performance in terms of signal-to-noise ratio (SNR), the objective difference grade (ODG) and Hansen's audio quality metric. In particular, our proposed framework outperformed the learning-based methods (D2WGAN and SG) and the traditional statistical algorithms (e.g., SPAIN, TDC, WCP).
翻译:近年来,利用深度学习算法对丢失或损坏的数字音频信号片段进行重建的研究已得到广泛探索。然而,基于线性插值、相位编码和音调插入等技术的传统方法仍普遍使用。但现有文献中尚未发现将抖动、隐写与机器学习回归器融合进行音频信号重建的研究。因此,本文提出了一种结合隐写、半色调(抖动)技术以及前沿浅层与深度学习方法的方案。通过三种不同指标对结果进行评估(包括SPAIN、自回归、基于深度学习、基于图的方法及其他方法)。实验结果表明,所提方案效果显著,能够增强基于隐写提供的侧信息(如潜在表征)的音频信号重建性能。此外,本文提出了一种利用半色调(即抖动)和机器学习从高度压缩的嵌入式音频数据中进行重建的新型框架,命名为HCR(基于半色调的压缩与重建)。该工作可能激发对该方法的优化及其向不同领域(如图像重建)迁移的兴趣。与现有方法相比,我们在信噪比(SNR)、客观差异等级(ODG)和汉森音频质量指标上展示了修复性能的提升。特别是,所提框架在性能上超越了基于学习的方法(D2WGAN和SG)以及传统统计算法(如SPAIN、TDC、WCP)。