Differentiable digital signal processing (DDSP) techniques, including methods for audio synthesis, have gained attention in recent years and lend themselves to interpretability in the parameter space. However, current differentiable synthesis methods have not explicitly sought to model the transient portion of signals, which is important for percussive sounds. In this work, we present a unified synthesis framework aiming to address transient generation and percussive synthesis within a DDSP framework. To this end, we propose a model for percussive synthesis that builds on sinusoidal modeling synthesis and incorporates a modulated temporal convolutional network for transient generation. We use a modified sinusoidal peak picking algorithm to generate time-varying non-harmonic sinusoids and pair it with differentiable noise and transient encoders that are jointly trained to reconstruct drumset sounds. We compute a set of reconstruction metrics using a large dataset of acoustic and electronic percussion samples that show that our method leads to improved onset signal reconstruction for membranophone percussion instruments.
翻译:可微数字信号处理(DDSP)技术(包括音频合成方法)近年来备受关注,因其在参数空间中具有可解释性。然而,当前的可微合成方法并未明确针对信号瞬态部分(这对打击类声音至关重要)进行建模。本文提出一个统一的合成框架,旨在DDSP框架内解决瞬态生成与打击合成问题。为此,我们设计了一种基于正弦建模合成的打击合成模型,并引入调制时域卷积网络用于瞬态生成。我们采用改进的正弦峰值提取算法生成时变非谐波正弦信号,并将其与可微噪声编码器和瞬态编码器配对,通过联合训练实现鼓组声音重构。利用大规模声学与电子打击乐样本数据集计算的多项重构指标表明,我们的方法能够显著提升膜鸣打击乐器起振信号的重构质量。