Controllable neural audio synthesis of sound effects is a challenging task due to the potential scarcity and spectro-temporal variance of the data. Differentiable digital signal processing (DDSP) synthesisers have been successfully employed to model and control musical and harmonic signals using relatively limited data and computational resources. Here we propose NoiseBandNet, an architecture capable of synthesising and controlling sound effects by filtering white noise through a filterbank, thus going further than previous systems that make assumptions about the harmonic nature of sounds. We evaluate our approach via a series of experiments, modelling footsteps, thunderstorm, pottery, knocking, and metal sound effects. Comparing NoiseBandNet audio reconstruction capabilities to four variants of the DDSP-filtered noise synthesiser, NoiseBandNet scores higher in nine out of ten evaluation categories, establishing a flexible DDSP method for generating time-varying, inharmonic sound effects of arbitrary length with both good time and frequency resolution. Finally, we introduce some potential creative uses of NoiseBandNet, by generating variations, performing loudness transfer, and by training it on user-defined control curves.
翻译:声效的可控神经音频合成是一项具有挑战性的任务,原因在于数据可能存在稀缺性以及频谱-时间方差的复杂性。可微数字信号处理合成器已被成功用于在相对有限的数据和计算资源下对音乐及谐波信号进行建模与控制。本文提出NoiseBandNet架构,该架构通过滤波器组对白噪声进行滤波,从而实现声效的合成与控制,突破了先前依赖声音谐波特性假设的系统的局限。我们通过一系列实验评估该方法,对脚步声、雷暴、陶器、敲击及金属声效进行建模。将NoiseBandNet的音频重建能力与四种DDSP滤波噪声合成器变体进行对比,结果表明NoiseBandNet在十个评估类别中有九个得分更高,确立了生成任意长度、具备良好时频分辨率的时变非谐波声效的灵活DDSP方法。最后,我们通过生成变体、执行响度迁移以及基于用户定义控制曲线进行训练,展示了NoiseBandNet的潜在创造性应用。