Removing audio effects from electric guitar recordings makes it easier for post-production and sound editing. An audio distortion recovery model not only improves the clarity of the guitar sounds but also opens up new opportunities for creative adjustments in mixing and mastering. While progress have been made in creating such models, previous efforts have largely focused on synthetic distortions that may be too simplistic to accurately capture the complexities seen in real-world recordings. In this paper, we tackle the task by using a dataset of guitar recordings rendered with commercial-grade audio effect VST plugins. Moreover, we introduce a novel two-stage methodology for audio distortion recovery. The idea is to firstly process the audio signal in the Mel-spectrogram domain in the first stage, and then use a neural vocoder to generate the pristine original guitar sound from the processed Mel-spectrogram in the second stage. We report a set of experiments demonstrating the effectiveness of our approach over existing methods, through both subjective and objective evaluation metrics.
翻译:从电吉他录音中移除音频效果可为后期制作与声音编辑提供便利。音频失真恢复模型不仅能提升吉他音色的清晰度,还为混音与母带处理中的创造性调整开辟了新途径。尽管此类模型的研发已取得进展,但先前研究主要集中于合成失真类型,这类失真可能过于简化,难以准确捕捉真实录音中的复杂特性。本文通过使用由商业级音频效果VST插件渲染的吉他录音数据集来解决该任务。此外,我们提出了一种新颖的音频失真恢复两阶段方法:第一阶段在梅尔频谱域处理音频信号,第二阶段通过神经声码器从处理后的梅尔频谱生成原始纯净吉他音色。我们通过一系列实验,结合主观与客观评价指标,证明了本方法相较于现有方案的有效性。