Controlling the variations of sound effects using neural audio synthesis models has been a difficult task. Differentiable digital signal processing (DDSP) provides a lightweight solution that achieves high-quality sound synthesis while enabling deterministic acoustic attribute control by incorporating pre-processed audio features and digital synthesizers. In this research, we introduce DDSP-SFX, a model based on the DDSP architecture capable of synthesizing high-quality sound effects while enabling users to control the timbre variations easily. We propose a transient modelling technique with higher objective evaluation scores and subjective ratings over impulsive signals (footsteps, gunshots). We propose a simple method that achieves timbre variation control while also allowing deterministic attribute control. We further qualitatively show the timbre transfer performance using voice as the guiding sound.
翻译:利用神经音频合成模型控制音效变化一直是一项艰巨的任务。可微数字信号处理通过整合预处理音频特征与数字合成器,提供了一种轻量级解决方案,既能实现高质量声音合成,又能支持确定性声学属性控制。本研究提出DDSP-SFX模型,该模型基于DDSP架构,可合成高质量音效,同时使用户能够轻松控制音色变化。我们提出了一种瞬态建模技术,在脉冲信号(脚步声、枪声)上获得了更高的客观评估分数与主观评价。我们提出了一种简单方法,既能实现音色变化控制,又能进行确定性属性控制。我们进一步通过语音作为引导声音,定性展示了音色迁移性能。