Blind Estimation of Audio Effects (BE-AFX) aims at estimating the Audio Effects (AFXs) applied to an original, unprocessed audio sample solely based on the processed audio sample. To train such a system traditional approaches optimize a loss between ground truth and estimated AFX parameters. This involves knowing the exact implementation of the AFXs used for the process. In this work, we propose an alternative solution that eliminates the requirement for knowing this implementation. Instead, we introduce an auto-encoder approach, which optimizes an audio quality metric. We explore, suggest, and compare various implementations of commonly used mastering AFXs, using differential signal processing or neural approximations. Our findings demonstrate that our auto-encoder approach yields superior estimates of the audio quality produced by a chain of AFXs, compared to the traditional parameter-based approach, even if the latter provides a more accurate parameter estimation.
翻译:盲估计音频效果旨在仅基于处理后的音频样本,估计原始未处理音频样本所施加的音频效果。为训练此类系统,传统方法通过优化地面真值与估计音频效果参数之间的损失函数来实现,这需要知晓处理过程中所用音频效果的确切实现方式。本文提出一种替代方案,消除了对此实现方式的认知需求。具体而言,我们引入自编码器方法,通过优化音频质量指标进行学习。我们探索、提出并比较了常用母带音频效果的多种实现方案,采用可微信号处理或神经近似方法。研究结果表明,与传统的基于参数的方法相比,即使后者能提供更准确的参数估计,我们的自编码器方法在估计由音频效果链产生的音频质量方面仍展现出更优效果。