The term "differentiable digital signal processing" describes a family of techniques in which loss function gradients are backpropagated through digital signal processors, facilitating their integration into neural networks. This article surveys the literature on differentiable audio signal processing, focusing on its use in music & speech synthesis. We catalogue applications to tasks including music performance rendering, sound matching, and voice transformation, discussing the motivations for and implications of the use of this methodology. This is accompanied by an overview of digital signal processing operations that have been implemented differentiably. Finally, we highlight open challenges, including optimisation pathologies, robustness to real-world conditions, and design trade-offs, and discuss directions for future research.
翻译:“可微数字信号处理”描述了一类技术,其核心在于通过数字信号处理器反向传播损失函数梯度,从而促进其与神经网络的集成。本文系统梳理了可微音频信号处理的相关文献,重点关注其在音乐与语音合成中的应用。我们分类整理了包括音乐演奏呈现、声音匹配以及语音变换等任务的应用案例,并讨论了采用该方法的动机与影响。同时,本文概述了已实现可微化的数字信号处理操作。最后,我们指出了当前面临的开放挑战,包括优化病态问题、对现实条件的鲁棒性以及设计权衡,并探讨了未来研究方向。