Noise reduction techniques based on deep learning have demonstrated impressive performance in enhancing the overall quality of recorded speech. While these approaches are highly performant, their application in audio engineering can be limited due to a number of factors. These include operation only on speech without support for music, lack of real-time capability, lack of interpretable control parameters, operation at lower sample rates, and a tendency to introduce artifacts. On the other hand, signal processing-based noise reduction algorithms offer fine-grained control and operation on a broad range of content, however, they often require manual operation to achieve the best results. To address the limitations of both approaches, in this work we introduce a method that leverages a signal processing-based denoiser that when combined with a neural network controller, enables fully automatic and high-fidelity noise reduction on both speech and music signals. We evaluate our proposed method with objective metrics and a perceptual listening test. Our evaluation reveals that speech enhancement models can be extended to music, however training the model to remove only stationary noise is critical. Furthermore, our proposed approach achieves performance on par with the deep learning models, while being significantly more efficient and introducing fewer artifacts in some cases. Listening examples are available online at https://tape.it/research/denoiser .
翻译:基于深度学习的降噪技术在提升录音语音整体质量方面展现了显著性能。尽管这些方法效果优异,但其在音频工程中的应用仍受到若干因素限制,包括仅支持语音处理而无法处理音乐、缺乏实时能力、缺少可解释的控制参数、仅在较低采样率下运行以及易引入伪影等问题。另一方面,基于信号处理的降噪算法虽能提供精细控制并适用于广泛内容类型,但常需人工操作方能达到最佳效果。为克服这两种方法的局限性,本研究提出一种结合信号处理降噪器与神经网络控制器的方法,实现了对语音和音乐信号的完全自动、高保真降噪。我们通过客观指标和感知听音测试对所提方法进行评估。评估结果表明,语音增强模型可扩展至音乐领域,但训练模型仅去除平稳噪声至关重要。此外,所提方法在性能上与深度学习模型相当,同时在某些情况下效率显著更高且伪影更少。听音示例见 https://tape.it/research/denoiser 。