Virtual analog (VA) audio effects are increasingly based on neural networks and deep learning frameworks. Due to the underlying black-box methodology, a successful model will learn to approximate the data it is presented, including potential errors such as latency and audio dropouts as well as non-linear characteristics and frequency-dependent phase shifts produced by the hardware. The latter is of particular interest as the learned phase-response might cause unwanted audible artifacts when the effect is used for creative processing techniques such as dry-wet mixing or parallel compression. To overcome these artifacts we propose differentiable signal processing tools and deep optimization structures for automatically tuning all-pass filters to predict the phase response of different VA simulations, and align processed signals that are out of phase. The approaches are assessed using objective metrics while listening tests evaluate their ability to enhance the quality of parallel path processing techniques. Ultimately, an over-parameterized, BiasNet-based, all-pass model is proposed for the optimization problem under consideration, resulting in models that can estimate all-pass filter coefficients to align a dry signal with its affected, wet, equivalent.
翻译:虚拟模拟(VA)音频效果日益基于神经网络与深度学习框架。由于底层黑箱方法,成功的模型将学会逼近所呈现的数据,包括潜在误差(如延迟和音频丢失),以及硬件产生的非线性特性与频率相关相移。后者尤为关键,因为当效果用于干湿混合或并行压缩等创意处理技术时,学习到的相位响应可能导致不期望的可听伪影。为克服这些伪影,我们提出可微信号处理工具与深度优化结构,用于自动调谐全通滤波器以预测不同VA仿真的相位响应,并对失相位的处理信号进行对齐。使用客观指标评估这些方法,同时通过听力测试评估它们提升并行路径处理技术质量的能力。最终,针对所考虑的优化问题,提出一种基于BiasNet的过参数化全通模型,该模型能估计全通滤波器系数,以将干信号与其对应的湿信号对齐。