This paper presents DiffMoog - a differentiable modular synthesizer with a comprehensive set of modules typically found in commercial instruments. Being differentiable, it allows integration into neural networks, enabling automated sound matching, to replicate a given audio input. Notably, DiffMoog facilitates modulation capabilities (FM/AM), low-frequency oscillators (LFOs), filters, envelope shapers, and the ability for users to create custom signal chains. We introduce an open-source platform that comprises DiffMoog and an end-to-end sound matching framework. This framework utilizes a novel signal-chain loss and an encoder network that self-programs its outputs to predict DiffMoogs parameters based on the user-defined modular architecture. Moreover, we provide insights and lessons learned towards sound matching using differentiable synthesis. Combining robust sound capabilities with a holistic platform, DiffMoog stands as a premier asset for expediting research in audio synthesis and machine learning.
翻译:本文提出了DiffMoog——一种可微分模块化合成器,其包含商业乐器中常见的完整模块组。由于具备可微分特性,它能够集成到神经网络中,实现自动化声音匹配以复制给定的音频输入。值得注意的是,DiffMoog支持调制功能(FM/AM)、低频振荡器(LFOs)、滤波器、包络整形器,并允许用户创建自定义信号链路。我们介绍了一个开源平台,该平台包含DiffMoog和一个端到端的声音匹配框架。该框架利用一种新颖的信号链路损失函数和编码器网络,该网络能根据用户定义的模块化架构自编程其输出以预测DiffMoog的参数。此外,我们提供了利用可微分合成进行声音匹配的见解和经验教训。结合强大的声音能力与整体化平台,DiffMoog成为加速音频合成与机器学习研究的首要工具。